February 17, 2022 by Gary Dagastine Artificial intelligence (AI) technology has made great strides in recent years, evolving from limited use in a small number of applications into an essential enabler of the systems that now pervade our lives. “Smart” thermostats, doorbells and voice assistants; semi-autonomous vehicles; medical monitoring devices with predictive capabilities; and a myriad of other applications in many fields now rely on AI technology. But AI and its specialized subsets (machine learning, deep learning and neuromorphic computing) have an Achilles Heel that stands in the way of further progress: a huge and growing energy appetite. As AI computing becomes more demanding and its overall use grows, the amount of energy required for AI computations and data transport is rapidly increasing, leading to excessive use of energy resources and to a significantly increased global carbon footprint. This growth in energy usage is unsustainable. Consider data centers, which make heavy use of AI. In 2017 they consumed about three percent of all the electrical power in the U.S, but by 2020 that had doubled to six percent, and there’s no end in sight. Industry projections say that by 2041 data centers theoretically would consume the world’s entire energy output if today’s inefficient compute architectures were still in use. AI’s energy challenge isn’t restricted to data centers. Battery-powered Internet of Things (IoT) devices at the network edge also have large power requirements, in the aggregate. As more AI processing moves to the edge, increasingly sophisticated IoT devices must become much more efficient so that their lithium-ion batteries can power more functions, last longer and/or be made physically smaller. That also would help to reduce the growing volumes of potentially hazardous Li-ion waste from discarded batteries. GlobalFoundries (GF) has aligned its product roadmap to address the AI energy challenge, by incorporating a series of technical innovations into its 12LP/12LP+ FinFET solution (used in data centers and IoT edge servers) and 22FDX® FD-SOI solution (used at the IoT edge). In addition, GF is working with leading AI researchers to develop new, more efficient computing architectures and algorithms to open up new AI horizons. A Paradigm Change for AI An AI system gathers large amounts of either structured or unstructured data and then processes it according to an algorithm written for a given application. The goal is to find relevant correlations and patterns within the data, to make inferences and decisions based on them, and to act on those inferences in a way that satisfies the needs of the application. Intensive computer processing is required, given the size of the data sets and the sophistication of the algorithms. “At the present time most AI tasks are running in the cloud, but the data sets that are fed into the algorithms in the cloud come in from the outside world, through an analog interface like an IoT device on the edge,” said Ted Letavic, CTO and VP Computing and Wireless Infrastructure (CWI) at GF. “The cloud-based AI paradigm is energy inefficient, as it requires the transport of large amounts of data from the edge of the network (IoT edge) to the datacenter where the computations are performed and results derived, and subsequent transport of the results back to the edge device. Not only is this energy inefficient, the time associated with data transport results in an overall system latency which precludes use for many safety-critical AI applications.” At first, traditional general-purpose central processing units (CPUs) were used for AI and machine learning. “These were designed for random memory access, which has become problematic given the growing need to reduce the time and energy spent transferring data between processors and memory,” Letavic said. “We need to change the paradigm, and process the data stored within the memory network itself without having to transport it.” As a result, he said, a fundamental shift in computing architectures is taking place. A “renaissance of design” is occurring towards domain-specific compute architectures which are extremely energy efficient for AI inference (training) tasks which include well-defined dataflow and compute paths. These optimized accelerators resemble memory hierarchies, often referred to as “digital compute-in-memory” or “analog compute-in-memory.” These accelerators perform parallel operations making them ideal for the type of computations at the heart of AI, and at substantially lower total power which enables greater use of AI at the network edge. 4X More Efficient Memory with GF’s 12LP+ To accommodate these changes in architecture, GF has made technology improvements and enabled new design flows. “In virtually every single AI workload we examined, memory bandwidth and memory access power limited overall capabilities, because a certain number of operations must take place within a fixed power budget, and memory consumed far too much of it,” Letavic said. “So we applied some learnings from our 7nm technology development effort to our 12LP/LP+ technology, and came out with the industry’s first 1 GHz-capable 0.55V SRAM memory macros, which for typical workloads reduce the energy associated with memory access by a factor of four. This solution is targeted at systolic array processors and is directly applicable to AI and machine learning workloads.” Next, GF looked at the array architectures, Letavic said. “We found that every single customer had a different dataflow architecture and there was basically no way to select an optimum design,” he said “To address this, we created a novel design flow that synthesizes logic and memory elements together so they can be built in very close proximity with a high degree of flexibility. This design flow breaks the conventional paradigm of logic and memory macro synthesis, and the intermingling of logic and memory elements can be used to implement very novel AI architectures.” Advances in GF technology, coupled with a new and unique design and synthesis flow, are powerful tools for the implementation of new compute paradigms, Letavic said, and further unlock the promise of AI. Important work in this area is taking place in collaboration with leading research institutions. Dr. Marian Verhelst and GF’s University Connection GF is collaborating with some of the world’s leading researchers to study these novel architectures and establish objective benefits and proof points for them, which GF’s customers then could use to design more efficient AI systems. Much of this work is taking place through collaborations with research consortia such as imec, and with university professors through GF’s University Partnership Program (UPP). Under the program, GF works closely with worldwide academic researchers on innovative projects leveraging GF technology. One of GF’s leading academic collaborators is Dr. Marian Verhelst, Professor at the KU Leuven university in Leuven, Belgium, and also a research director at Imec. Dr. Verhelst is one of the world’s leading experts in highly efficient processing architectures. She previously worked at Intel Labs in the U.S. on digitally enhanced analog and RF circuits, and came to KU Leuven in 2012 where she started a research lab, which currently has 16 doctoral students and postdoctoral researchers. Her lab’s work encompasses everything from long-horizon, big-picture projects funded by the European Union, to nearer-term efforts that involve technology transfer to a wide range of industry players. She has been awarded Belgium’s André Mischke YAE Prize, which recognizes internationally leading academic research, management, and evidence-based policy making. A former member of the Young Academy of Belgium and the Flemish STEM platform, she is an outspoken advocate for science and education, and has been featured on several popular science shows on national television. In 2014, she founded InnovationLab, which develops interactive engineering projects for high school teachers and their students. She also is a member of the IEEE’s Women in Circuits initiative, among many other advocacy and educational activities. Sorry, this video requires cookie consent. Please accept marketing-cookies to watch this video. The DIANA Chip – a Significant Step Forward for AI Dr. Verhelst has led an effort to produce a hybrid neural network chip which is the world’s first chip to not only combine analog compute-in-memory and digital systolic arrays, but which can seamlessly partition the AI algorithm across these heterogenous resources to achieve optimum energy performance, accuracy, and latency. Called DIANA (DIgital and ANAlog), the chip was built using GF’s 22FDX platform and will be featured in a paper to be delivered later this month at the prestigious 2022 International Solid State Circuits Conference (ISSCC). “Machine learning is booming and everyone has a processor optimized for machine learning, but mostly they’ve been designed purely in the digital domain, and they compute using zeros and ones, which isn’t always the most efficient thing you can do,” Verhelst said. “Therefore, many researchers are now investigating computing in the analog domain, even inside SRAM memories, working with current accumulation across SRAM cells instead of with zeroes and ones. That can be much more efficient from an energy point of view, and also from a chip-density point of view because it allows you to do more computing per square millimeter.” “There have been some excellent results thus far, but only for specific machine learning networks which happen to nicely match the shape of the memories. For others, the algorithms don’t necessarily run efficiently,” she said. “The DIANA chip contains a host processor along with both a digital and an analog-in-memory co-processor. For every layer of a neural network, it can dispatch a given layer to the inference accelerator or co-processor that will operate most efficiently. Everything runs in parallel and intermediate data is efficiently shared among the layers.” To achieve this, Verhelst’s team developed advanced schedulers and mappers, which analyze a chip’s hardware characteristics to determine either the most energy-optimal or most latency-optimal “order of compute,” or how to run a given algorithm on the chip. “There are many ways to run an algorithm, depending on how much memory you have, its characteristics, how many compute elements there are in your processing array, and so on,” she said. “So we developed tools into which you can enter the hardware characteristics, and which help to find the optimal solution for your workload.” An Ongoing Collaboration The DIANA chip is the latest result of Verhelst’s work with GF, which began about five years ago when GF offered the opportunity for one of her Ph.D. students to tape out a video processing chip on 22FDX technology, which could efficiently carry out hundreds of operations in parallel. Subsequently, Verhelst worked on GF’s 12 LP+ technology to build a deep-learning chip for a very dense compute fabric, with more than 2,000 multipliers on the chip and large SRAM content. Yet another project that is in initial stages is to use GF’s 22FDX platform to build a heavily duty-cycled machine learning chip with a focus on extremely low-power operation for IoT, machine monitoring or other sensor nodes that must operate on milliwatts of power. She says that the silicon access and technical partnership which GF provides is invaluable. “Producing working silicon can be very expensive, especially for digital processors which are physically large. Working with GF provides us both a lower barrier to silicon and access to the latest relevant IP,’ she said. “Also, GF provides us with advice and support for what are sometimes difficult physical design closure jobs, which isn’t necessarily trivial any longer given these advanced technologies. There are so many things you have to take into account in the backend, that GF’s manufacturing experience really helps us when we are trying to ensure things like fast IO, good oscillators, optimum power gating and so on.” Looking Ahead When asked what’s next for GF with regard to more energy-efficient AI, Letavic mentioned the company’s work with integrated voltage regulation for the compute die itself, and silicon photonics for even higher levels of transport and compute efficiencies. “Improved power delivery is a way to compensate for the lack of power scaling at smaller nodes, which has become a real limitation at the systems level,” he said. “One of the key ways to save total application power is just to be more efficient at the way you deliver current and voltage to the processor core. We’re exploring various options, and it could be a very large opportunity for GF given our long heritage in bipolar CMOS and DMOS power devices.” Letavic also mentioned that photonic acceleration, or using light (photons) instead of electricity (electrons) not only to transmit signals over optical fiber but for computing itself, may come to play a significant role in AI. “I would say this is developing a rate much faster than I had expected. And it’s another place where we have some really solid university engagements.” Read about other research taking place through’s GF’s University Partnership Program: GF Drives Progress in Next-Generation Automotive Radar Academic Collaborations Strengthen, Hasten GF’s Path to 6G Leadership GF Partners with Leading Researchers on 6G Technologies