• Home
  • Scientific results
  • The development of storage and computing integrated hardware architecture in the era of artificial intelligence-Dr. Zhang Xin

The development of storage and computing integrated hardware architecture in the era of artificial intelligence-Dr. Zhang Xin

2020-12-16

Technological advances have reduced the cost of computing, making it almost possible to include it in anything. Therefore, we now live in a world surrounded by computing devices. They power our searches on Google, connect our friends on Facebook, answer our questions on Siri, and provide us with entertainment on Youtube; they are all over all the devices in our homes, all home appliances, cars, The workplace, even the cards we send to each other. We have become accustomed to computing faster, cheaper and lower power consumption, we just assume that this situation will continue. Currently, glasses [1] and smart watches [2] have even embedded the functions of a smart phone [3].

Although expanding computing performance has never been an easy task, in the past decade, many factors have made it more and more difficult to expand performance, and power consumption has become a major constraint on performance. The reality that the rapid development of social informatization has brought about an ever-increasing amount of data and higher and higher requirements for computing speed challenges the computing capabilities of modern computers. With the crisis of Moore's Law [4] and the limitation of parallel computing, the hardware architecture cannot be absorbed into the traditional thinking of expanding the CPU when the amount of calculation is large, and piling up memory when the amount of storage is large. Rely heavily on. We can boldly think about whether the Feng’s architecture, which has separated storage and computing, which efficiently serves the needs of human computing, does not match contemporary artificial intelligence (AI) and other advanced technologies. It is just that the powerful iterative optimization capabilities of scientific and technological personnel have continuously improved its performance to Today's height. The integration of storage and calculation is an attempt to eliminate the energy-intensive and time-consuming data movement that plagues the separation of calculation and storage in the current Feng's architecture by designing a system that directly performs calculations in the memory. More and more serious memory access power consumption, artificial intelligence applications' large demand for data access and other reasons will lead to the development of the integration of storage and calculation into an inevitable trend [5].

The integration of storage and computing requires the integration of processors and memories, but the manufacturing process of processors and mainstream memories is different at this stage. If the functions of the memory are to be implemented on the processor, the storage density of the memory may be reduced; otherwise, the processor is implemented on the memory The function of the processor may affect the operating speed of the processor. This contradiction cannot be resolved for the time being. The emergence of some emerging non-volatile memory devices (also known as memristors) that can be scaled down to nanoscale dimensions has brought dawn [6,7]. Although the process standards are not yet mature, many experts believe that With the natural fusion characteristics of storage and computing, it is the best device to build the integration of computing and storage [8].

As shown in Figure 1, the traditional computing architecture is faced with severe challenges including the heat wall, the memory wall and the end of Moore’s law. The development of memristor technology may provide an alternative way to make hybrid memory logic integration, biologically inspired computing, and an effective reconfigurable storage-calculation integrated computing system possible [9]. In the figure, CMOS (complementary metal-oxide-semiconductor) is a metal oxide semiconductor, GPU (graphics processing unit) is a graphics processing unit, and CPU (central processing unit) is a central processing unit. Both GPU and CPU are traditional computing units designed with CMOS technology based on the memory-computing separation Feng's architecture.


image.png


Figure 1 The competition for future computing solutions [9]

The cross-point array structure based on new non-volatile devices (such as resistive random access memory RRAM [6,10] and phase change memory PCM [7,11], etc.) is naturally analog matrix vector multiplication (the core operation in AI technology) ) Provides a hardware accelerator, which provides a promising method for overcoming the limitations of the existing calculation methods based on Feng's architecture [12]. Research has shown that through analog data storage and physical calculations in memory (KirchovdineLaw and Ohm’s Law), analog circuits based on cross-point arrays can solve a wide range of algebraic problems such as matrix-vector multiplication (MVM), linear equations and matrix eigenvectors in one step without the need for time-consuming and energy-consuming iterations Operation [5, 13]. Figure 2 illustrates the concept of matrix vector multiplication in the cross-point array, where V j is the voltage applied to the jth column, j=1, 2, 3..., N, and N is the total number of columns in the cross-point array. The current induced by each cell by the applied voltage flows into the grounded row, and the total current generated in the i-th row is the conductance value of the unit in the i-th row and the j-th column. Equation (1) is the analog product of the conductance matrix G ij and the voltage vector V j, so as to realize the hardware-based MVM operation. Different from the time-consuming and labor-intensive digital multiplication and accumulation operations in traditional computers, thanks to Ohm’s law and Kirchhoff’s law, the analog MVM in the cross-point array can be completed in just one step [5], which has great advantages and applications potential. Although no iteration is a very attractive feature for fast calculations, and it is suitable for linear algebra problems that must be solved in a short time in many cases, with low energy budget and sufficient fault tolerance, but the stability of the current implementation scheme And the accuracy cannot be compared with the stability of high-precision digital computers and the accuracy of floating-point solutions. The artificial neural network calculations with low-precision regular integers are just suitable for the implementation of analog MVM. Starting from the application of neuromorphic computing, after the landing scenario, through technical iterations, the accuracy and stability of the future comparable to digital computing can be achieved. , And has higher energy efficiency.

image.png

Figure Figure 2 Schematic diagram of simulation calculation in cross point array [5]

Overall, improving storage and computing capabilities will always be an important development direction. The emergence of new non-volatile devices and the development of memristor technology have made the storage-computing integrated architecture the main direction for improving computing capabilities in the future. Studies have shown that the cross-point array chip based on the memristor can complete image recognition tasks in laboratory tests, and has the advantages of lower power consumption and higher speed compared with the traditional memory-computer separation architecture [14]. In the future, computing and storage will tend to converge, but it will take time to be realized and even widely used [8].

references:

[1] http://www.google.com/glass/start/

[2] https://getpebble.com

[3] Horowitz, M. Computing’s energy problem (and what we can do about it). 2014 IEEE Int.Solid-State Circuits Conf. Digest Tech. Papers (ISSCC)

[4] Waldrop M M. The chips are down for Moore’s law. Nature, 2016, 530(7589): 144-147.

[5] Lelmini D, et al. In-memory computing with resistive switching devices. Nat. Electron.,2018, 1(6): 333-343.

[6] Dmitri B, Strukov, Gregory S, et al. The missing memristor found. Nature, 2008, 453:80-83.

[7] Sebastian A, et al. Crystal growth within a phase change memory cell. NatureCommunications, 2014, 5(4314): 1-9.

[8] http://www.sangfor.net/about/source-news-product-news/1852.html

[9] Zidan M A, Strachan J P, Lu W D. The future of electronics based on memristive systems.Nature Electronics, 2018, 1(1):22-29.

[10] Li C, et al. Analogue signal and image processing with large memristor crossbars. Nat.Electron., 2018, 1(1): 52-59.

[11] Ding K K, et al. Phase-change heterostructure enables ultralow noise and drift for memoryoperation. Science, 2019, 366(6462): 210-+.

[12] Lin Yudeng et al. In-memory calculation based on a new type of memristor. Micronanoelectronics and Intelligent Manufacturing, 2019, 1(2): 35-46.

[13] Sun Z, et al. Solving matrix equations in one step with cross-point resistive arrays. PNAS, 2019, 116(10): 4123-4128.

[14] Yao P, Wu H, Gao B, et al. Fully hardware-implemented memristor convolutional neuralnetwork. Nature, 2020, 577: 641-646.