Tutorial Day - 1 | 8th Jan, 2023 | Sunday

1.1.1 Optimizations for Domain Specific MPSoC computing

Speakers

  • Sri Parameswaran, Professor, School of Computer Science and engineering, The University of New South Wales, Australia
  • Joerg Henkel, Chair of Embedded Systems, Karlsruhe Institute of Technology, Germany
  • Preeti Panda, Prof. Anshul Kumar Chair; Professor, Dept. of Computer Sc. & Engg. and School of IT, IIT Delhi, India
  • Soumya J, Assistant Professor, Department of Electrical and Electronics Engineering, BITS Pilani, Hyderabad Campus, India

Abstract

Meeting the multitude of constraints when designing a modern Multi-Processor Systems on Chip (MPSoC) necessitates the sagacious choice of thousands of components, network parameters, architectures etc. The search for the optimal solution would take far too long to fulfil the design turnaround times demanded by the contemporary market. In this tutorial, we seek to examine some of the differing ways in which optimizations can be performed to meet constraints.

Smart resource management for multicores
This tutorial starts with introducing some basic concepts of machine learning for resource constraint systems and it spends the major part in introducing representative resource management approaches as far as system constraints are concerned the focus is on low power and thermal. Approaches like smart thermal management and smart boosting etc. are presented in more detail. At the end the tutorial attendee will have a good knowledge in state-of-the-art in smart resource management techniques for multicores.

Customizing 3D Memory Thermal Management for Neural Networks
In this part of the tutorial, we elaborate on mechanisms recently studied for extracting high performance from stacked 3D memory systems operating under thermal constraints. 3D memory offers the possibility of high data access throughput, but higher power densities lead to thermal hotspots that need careful system level management. Dynamic Thermal Management strategies need to be carefully coordinated with voltage/frequency scaling and task mapping decisions.

Optimization of on-chip Networks for General and Application-Specific Multiprocessor Architectures
In this part of the tutorial, we will investigate on-chip network communication among multiprocessors. For data-intensive and communication-intensive applications the task mapping onto the network will be discussed along with the optimization towards the communication, latency etc. This tutorial also discusses design of on-chip network router topologies as per the application requirements.

Optimization of Pipelined Multiprocessor Architectures and Memories
In this part of the tutorial, we will examine a differing paradigm in multi-processor architectures. For streaming applications, such as video compression, decompression and transmission, there needs to be a significant amount of processing and such a system requires processors to be distributed and to be connected in particular ways so that the performance can be maximized, with minimal additional resources. Memory too has to be carefully partitioned between these processors to make certain that the system is adequately provisioned to meet deadlines, while not starving the system. In this part of the tutorial, we discuss the creation of such systems and examine ways of optimizing them for efficient throughput.

Duration: 3 hrs [45 min for each speaker]

Speakers Bio

Jörg Henkel

Jörg Henkel is with Karlsruhe Institute of Technology (KIT), Germany, Chair for Embedded Systems CES. Before, he was a Senior Research Staff Member at NEC Laboratories in Princeton, NJ. He received his PhD from Braunschweig University with “Summa cum Laude”. J. Henkel is/was the General Chair of major conferences in design automation and embedded systems like ICCAD, ESWeek and DAC (in 2023). He received the 2008 DATE Best Paper Award, the 2009 IEEE/ACM William J. Mc Calla ICCAD Best Paper Award, the Codes + ISSS 2015, 2014, and 2011 Best Paper Awards. He is the Chairman of the IEEE Computer Society, Germany Section. He was the Editor-in-Chief of the ACM Transactions on Embedded Computing Systems (ACM TECS) and the Editor-in-Chief of the IEEE Design & Test Magazine, both for two terms. He served as the conference chair and Vice Chair for ACM SIGDA. He is an initiator and the coordinator of the German Research Foundation’s (DFG) program on ‘Dependable Embedded Systems’ (SPP 1500). He is the site coordinator (Karlsruhe site) of the Three-University Collaborative Research Centre on “Invasive Computing” (DFG TR89). He is currently also the Vice President for Publications of IEEE CEDA. He holds ten US patents and is a Fellow of the IEEE.

Preeti Ranjan Panda

Preeti Ranjan Panda received his B. Tech. in CSE from IIT Madras and M. S. and Ph.D. from the University of California at Irvine. He is currently a Professor in CSE at IIT Delhi. He has previously worked at Texas Instruments and Synopsys and has been a visiting scholar at Stanford University. His research interests span various topics in Embedded Systems and EDA. He is the author of two books on embedded memory and power-efficient system design, and a recipient of an IBM Faculty Award, the IESA Techno Mentor Award, and a Department of Science and Technology Young Scientist Award. Research works authored by Prof. Panda and his students have received several honours, including Best Paper nominations at CODES+ISSS, DATE, ASPDAC, and VLSI Design Conference, and Most downloaded paper of ACM TODAES journal. Prof. Panda has served as the Editor-in-Chief of IEEE Embedded Systems Letters, on the editorial boards of several journals including IEEE TCAD, ACM TODAES, IEEE TMSCS, and IJPP, as the General co-Chair of VLSI Design, as Technical Program co-Chair of CASES, CODES+ISSS, and on the organizing/program committees of several conferences in the areas of Embedded Systems and Design Automation, including DAC, ICCAD, DATE, IPDPS, ASPDAC, and EMSOFT.

Soumya Joshi

Soumya Joshi is a young researcher in the area of on-chip network design, optimization and verification. She is an Assistant Professor in the Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science, Hyderabad Campus. She was a visiting scholar at TU Wien in 2018, visiting faculty at UNSW in 2019 and visiting scholar at the University of Agder since 2017. She worked as a Faculty member at National Institute of Technology Goa before joining BITS. She also worked as Scientist ‘SC’ at the Indian Space Research Organization. She completed her Masters and PhD from IIT Kharagpur in the area of Embedded Systems. She is a recipient of the Early Career Research Award in the year 2017.

Sri Parameswaran

Sri Parameswaran is a Professor in the School of Computer Science and Engineering at the University of New South Wales. He was in the role of Acting Head of School at the University of New South Wales from 2019 to 2020. He served as the Program Director for Computer Engineering. His research interests are in System Level Synthesis, Low power systems, High Level Systems and Network on Chips. He also served as the Editor in Chief. He has served on the Program Committees of Design Automation Conference (DAC), Design and Test in Europe (DATE), the International Conference on Computer Aided Design (ICCAD), the International Conference on Hardware/Software Code-sign and System Synthesis (CODES-ISSS), and the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). Sri Parameswaran received his B.Eng Degree from Monash University and his PhD from The University of Queensland

1.1.2 RISC-V: The Open Era of Computing

Speaker

  • P R Sivakumar, Founder and CEO, Maven Silicon

Abstract

Processors are the brain of the chips. The evolution of processors enables chip architects and designers to create robust electronic systems like smartphones, laptops, cloud servers, and many more. Though this evolution has been empowered by various proprietary ISAs, the current electronics market scenario and ecosystem demand the evolution of open ISA to enable and support innovation in the processor and system design space, especially to support emerging technologies like artificial intelligence. The new and powerful SoCs and chips can be realized with both proven and powerful proprietary ISAs and open ISAs. So, in this tutorial, we will explain everything about RISC-V and how the open ISA RISC-V will lead us in the future to the ‘Open era of computing’.

Tutorial Plan

Agenda

RISC-V ISA [1.5 hr.]

  • RISC-V Overview
  • RISC-V ISAs and Extensions, and ecosystem – Overview
  • RISC-V RV-32I Instructions
  • RISC-V CSRs
  • RISC-V Interrupts
  • RISC-V Debug

RISC-V RTL Design & Verification [1 hr.]

  • RISC-V RV32-I 5 stage pipeline processor design & performance
  • RISC-V RV32-I RTL verification – Verification Plan, UVM TB architecture, and Demo
  • RISC-V SoC Verification – SoC Verification Environment and Verification Flow

Conclusion: RISC-V – The open era of computing [30 min]

  • Why RISC-V?
  • RISC-V Open ISA adoption: Case Studies and Examples
  • RISC-V evolution in next ten years

Speaker Bio

Sivakumar P R

Sivakumar P R is the Founder and CEO of Maven Silicon. He is responsible for the company’s vision, overall strategy, business, and technology. He is also the Founder and CEO of Aceic Design Technologies.

Sivakumar is a seasoned engineering professional who has worked in various fields, including electrical engineering, academia, and semiconductors, for over 25 years. In the semiconductor industry, he has worked as a Verification Consultant for the top EDA companies Synopsys, Cadence, and Mentor Graphics and helped various ASIC and FPGA design houses deploy and use various verification methodologies effectively, resulting in successful tape out of IPs, Chips, and SoCs. He now specializes in offering Verification IPs and consulting services and EDA flow development, and has delivered corporate training courses for the top EDA and VLSI global MNCs. He is also the author of our online VLSI courses and blogger for Design & Reuse, Semiwiki, and RISC-V Blogs.

He is the recipient of the “Outstanding Technical Achievement” award from Cadence Design Systems and holds a degree in Electrical and Electronics Engineering from Madurai Kamaraj University.

1.2.1 Systematic design of Bandgap Voltage Reference Circuits

Speakers

  • Prof Shouri Chatterjee, IIT Delhi
  • Dr Rakesh Kumar Palani, Assistant Professor, Electrical Engineering Department, IIT Delhi
  • Prof Rajasekhar Nagulapalli, Oxford Brookes University, UK

Abstract

Almost any SoC demands an accurate generation of a reference voltage. An accurate reference voltage is a key requirement for Voltage Regulators and ADC/DAC references. With technology scaling, there is a need for generating reference voltages with nominal values less than 1V Bandgap references (BGR) provide temperature insensitive voltage/ current. Existing literature on BGR can be classified into two categories. The first one is the accurate reference generation based on Bipolars and finds applications in battery chargers, ADC, DAC, and PLL. where power consumption is secondary. The second one is extremely low power reference generation which is typically MOS based and finds applications in energy harvesting systems. Brokaw designed the first Bandgap reference circuit using NPN bipolar. However, the CMOS process doesn’t have the NPN bipolar and hence all conventional bandgaps are built using parasitic PNP formed by nwell and the substrate. This tutorial primarily focuses on the design of both the low power and accurate bandgap designs with an emphasis on the use of PNP bipolar. The following topics will be covered in the tutorial.

  • Performance metrics of the BGR and applications
  • MOS-based nano watt BGR principle
  • Properties of the parasitic PNP devices
  • Principle of BGR and mathematical background
  • Basics of Chopping and auto zeroing for use in BGR
  • Existing architectures and comparison.
  • Curvature compensation
  • Design methodology of the Bandgap.
  • Bandgap Startup and simulation methodology
  • Self-bias op-amp basics and intuitive explanation

Speakers Bio

Shouri Chatterjee

Shouri Chatterjee received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, in 2000, and the M.S. and Ph.D. degrees in Electrical Engineering from Columbia University, New York, in 2002 and 2005, respectively. From 2005 to 2006, he was a design engineer in the wireless division at Silicon Laboratories Inc., Somerset, NJ. In November 2006, he joined the faculty of the department of Electrical Engineering of the Indian Institute of Technology, Delhi, India, as an Assistant Professor. He became a full Professor in March 2019. Currently, he is also serving as the Associate Dean, Curriculum, at IIT Delhi. Prof Chatterjee has worked in multiple areas of power management and RF circuits. A 20-nW MPPT energy harvesting power management circuit designed by him and his team, is still among the state-of-art energy harvesting designs published.

Rajasekhar Nagulapalli

Rajasekhar Nagulapalli received the B.Tech. degree in electrical engineering from Acharya Nagarjuna University, Guntur, India, in 2005, and the M.Tech. degree in microelectronics from the Indian Institute of Science, Bengaluru, India, in 2008 and presently working towards his PhD from Oxford Brookes University, UK. From 2008 to 2011, he was with Rambus Semiconductors, Bengaluru, where he designed PLLs and equalizers. From 2011 to 2014, he was with IHP Microelectronics, Frankfurt (Oder), Germany, where he was involved in the design of high-speed TIAs and PAM4 circuits. Since 2014, he has been with Inphi, Northampton, U.K., where he is involved in high-speed serial links, PLLs, and CDRs. He has authored more than 20 papers in the field of SERDES and has 14 granted U.S. patents.

Rakesh Kumar Palani

Rakesh Kumar Palani received the B.Tech. degree in Electrical engineering from the National Institute of Technology, Kurukshetra, Haryana in 2007, an MSc (Engg) in Microelectronics from the Indian Institute of Science, Bengaluru, India, in 2009 and a Ph.D. from the University of Minnesota, Minneapolis in 2015. He worked at Broadcom Irvine from 2015-2017 where he worked on high-performance analog circuits. Thereafter he worked at Maxlinear, Irvine where he was involved in the design of high-speed DACs. Presently he is working as an Assistant Professor in the Department of Electrical Engineering at IIT Delhi. His current interest lies in the development of IPs in the Analog/Mixed signal domain. He also served as a chair of, the IEEE CAS-CSS Delhi Chapter from 2021-2022.

1.2.2 CMOS Clocking Technology for Wireline, Wireless and SoC Applications

Speaker

  • Dr. Hormoz Djahanshahi, Associate Fellow, Microchip Technology Inc

Abstract

This tutorial provides an overview of integrated clocking technology and IPs developed in CMOS process. We discuss four categories of clock generation IPs for applications in wireline SERDES, wireless base station local oscillators (LOs), digital timing in system-on-chip (SoC) devices, and reference or recovered clock jitter attenuation. The evolution of clocking IPs from older CMOS technology nodes to 16nm FinFET is tracked through Figure-of-Merits aiming at improving a combination of power dissipation and clock phase noise or jitter, as well as silicon die area.

Speaker Bio

Dr. Hormoz Djahanshahi

Dr. Hormoz Djahanshahi received the B.Sc. and M.Sc. degrees (both Hons.) in Electronics Engineering from Amirkabir University (Tehran Polytechnic). He received his Ph.D. from the University of Windsor in analog VLSI and Neural Networks in 1997 and was a Post-Doctoral Fellow at the University of Toronto in 1997-1999 working on high-speed I/O and clocking circuitry. Since 2000 he has been with Mixed-Signal Development Group at PMC-Sierra, acquired by Microsemi in 2016 and Microchip Technology in 2018. He is currently an Associate Technical Fellow at Microchip, and has 26 patents, 42 publications and a plethora of industry experience designing and mentoring dozens of high-speed and high-performance clocking and analog designs in SERDES and Wireless products, and also participates in OIF and PCI-SIG standards workgroups. He is also a member of the Royal Astronomical Society of Canada and in his spare time watches the sky from his backyard through his 8” telescope.

1.3.1 Neuromorphic Computing: Technologies, Architectures, Systems, and Compilers

Speaker

  • Jan Moritz Joseph, RWTH Aachen University, Germany
  • Leticia Bolzani Poehls, RWTH Aachen University, Germany

Objectives

  • Present state-of-the-art knowledge on the development of neuromorphic computing, from technologies to circuits and architectures
  • Summarize the realistic application advances for ML and beyond
  • Show important issues related to emerging devices’ quality after manufacturing and reliability during lifetime
  • Summarize the main strategies for manufacturing testing
  • Show important hurdles to scaled and wide adoption of neuromorphic computing and how they can be removed
  • Explain the system and software integration of neuromorphic computing, including compilation, mapping, and hardware prototyping

Abstract

Resistive Random Access Memory (RRAM) is an emerging technology with high potential for many analog and digital applications. RRAM is used as analog elements in neuromorphic circuits, as a general non-volatile memory device, or as non-volatile logic gates. As a key advantage of RRAM as emerging memory technology, it can offer computations in the memory cell (computing in-memory, CIM).

In the first part of this talk, we will summarize the current state-of-the-art for neuromorphic edge-AI accelerators to motivate their key advantage. We will also discuss this technology’s challenges before mass-market adoption is possible. In the second part of the talk, we will present the main challenges related to the massive adoption of such technology. Finally, in the third part, we will focus on a second challenge, the software development kits for neuromorphic computing, including compilers and architecture co-optimization methods. We will introduce one possible solution developed at RWTH Aachen University, called “Neureka”. We are convinced that efficient, widely adopted neuromorphic systems will only be possible if they are integrated into existing edge-AI software stacks.

Tutorial Plan

  • Welcome and Organisation [5 minutes]
  • Introduction: [25 minutes]
    • Definition: What is neuromorphic computing?
    • Why does neuromorphic computing promise to solve today’s challenges?
    • What technologies enable neuromorphic computing?
    • What are the core challenges of neuromorphic computing?
  • Q&A session and short break [10 minutes]
  • Test of RRAMs [90 minutes]
    • Manufacturing deviations and their impact on the behaviour of novel devices
    • Defects and fault models
    • Manufacturing test strategies: state-of-the-art
    • Impact of undetectable faults on the reliability of novel devices during lifetime
  • Q&A session and short break [10 minutes]
  • Compilers for neuromorphic computing [30 minutes]
    • Principles of compilers for ML
    • Requirements for compilers in neuromorphic computing
    • Compiler flow
    • Case studies and optimization examples
  • Final Q&A [10 minutes]

Speakers Bio

Leticia Bolzani Poehls

Leticia Maria Bolzani Poehls graduated in Computer Science at the Federal University of Pelotas (Brazil) in 2001 and received the best thesis award for her work. In the year 2004 she received her Master of Science Degree in Electrical Engineering at Pontifical Catholic University of Rio Grande do Sul (Brazil). During her Ph.D. her work was focused on the development of New Techniques for Highly Reliable Systems-on-Chip. In 2008 she received her Ph.D. in Computer Engineering from the Politecnico di Torino (Italy). She holds three postdoctoral titles, the first one accomplished in 2008 in the field of Low Power Design of Integrated Circuits (ICs) at the Politecnico di Torino, the second one in 2010 with focus on Electromagnetic Interference-Aware Systems-on-Chip Design at the Catholic University of Rio Grande do Sul (Brazil), and the third form the Politecnico di Torino (Italy) achieved in 2013 in the area of emerging technologies. From 2010 to 2022 she was Professor of the School of Technology of the Catholic University of Rio Grande do Sul and part of the EASE research laboratory, leading the OASiS research group. Currently, she is a senior researcher at RWTH Aachen University working on test and reliability of memristive devices, more specifically she is working to develop new fault models and manufacturing testing strategies for Resistive RAMs as well fault tolerance approaches for memristor-based circuits and systems. Her fields of interest basically include test & fault tolerance of CMOS-based integrated systems, including emerging technologies, power-, aging- and temperature-aware integrated circuit design, and Electronic Design Automation (EDA) tools for optimization of integrated circuits. Among other activities, she serves as technical committee member in many IEEE-sponsored conferences. She is further member of the Steering Committee for the IEEE Latin American Test Symposium and the Biannual European – Latin American Summer School on Design, Test and Reliability (BELAS). Since 2016 she is Coordinating Editor of Journal of Electronic Testing: Theory and Application.

Jan Moritz Joseph

Dr. Joseph got his B.Sc. in medical engineering in 2011 and his M.Sc. in computer science in 2014 from the Universität zu Lübeck, Germany. From 2008 to 2014, he was a scholarship holder of the German Merit Foundation (Deutsche Studienstiftung e.V.). Dr. Joseph received his Ph.D. from Otto-von-Guericke Universität Magdeburg, Germany, in 2019. The title of his thesis was “Networks-on-Chip for heterogeneous 3D Systems-on-Chip”. His Ph.D. was awarded the highest honors “summa cum laude”. In 2020, he received the award for the best PhD thesis from the Faculty of Electrical Engineering and Information Technology at Otto-von-Guericke Universität Magdeburg, Germany. From 2019 to 2020 Dr. Joseph was a visiting researcher at Dr. Krishna’s Synergy Lab at Georgia Institute of Technology, Atlanta, GA. His stay was partially funded by a scholarship from the German Academic Exchange Service. He joined Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, in June 2020 as a postdoctoral research fellow in the Chair for Software for Systems on Silicon.

1.3.2 Hardware Design for Machine Learning

Speakers

  • Prof Sandip Kundu, University of Massachusetts, Amherst, MA, USA

Objectives

The objectives of this tutorial are as follows: –

  • Present an overview of convolutional neural networks including training process to lead into the kernel functions
  • Summarize the trade-off involved across key hardware design metrics including accuracy, throughput, latency, cost, programmability, and scalability
  • Present latest developments in hardware acceleration techniques for CPU, GPU, FPGA, in memory computing and computational storage devices (CSDs)

Abstract

In many real-world applications, Deep Neural Networks (DNNs) are now performing at human levels, and DNN-based intelligent services are gaining popularity in various facets of the digital infrastructure from computing to networks and storage.

Today’s DNN models feature hundreds of millions of parameters. These parameters along with intermediate values require tens of megabytes to gigabytes of storage in memory. Thus, even a single inference not only involves billions of computing operations, but also massive flow of data from memory to computational units.

Improving performance necessitates spatial computation involving large amount of parallel processing, while reducing energy often means reducing the amount of data movement since data movement typically cost more power than computation. Reduced precision computation has become a popular method for decreasing computational energy, but it also comes with an obvious trade-off with rounding errors. Staging data at various points in network-on-chip and network-off-chip is a conventional solution towards reduction in data movement. This integrates well with various low-power emerging memory technologies. Delegation of some computation tasks to memory and/or storage further reduces data movement and have become increasingly popular.

General purpose computing is often an inefficient approach towards implementing spatial computation as different applications have different accuracy, storage, and computing needs. Despite this handicap, general purpose computing has made great strides into spatial computing. Recent evolution in vector processing has seen introduction of AMX instructions to x86 architecture with support for a variety of data precisions. These developments greatly enhance ML acceleration in general and ML training, in particular.

This tutorial will cover the concepts, the applications, and the current practices.

Tutorial Plan

  • Introduction and Organization [15 min]
  • Convolutional Neural Network [75 min]
    • Structure
    • Inference vs Training
    • Gradient descent
    • Back propagation, importance of floating-point calculation
  • Kernel functions [15 min]
    • Complexity
  • Basic CPU Architecture and Vector Processing [45 min]
    • SIMD architecture/programming
    • AVX/AMX instruction
    • MPI programming
  • Quantization and pruning [15 min]
  • TPU Case Study and Systolic Array [30 min]
  • Metrics and architectural trade-off [30 min]
  • Acceleration in memory/storage platforms [15 min]

Speaker Bio

Sandip Kundu

Sandip Kundu is a Professor of Electrical and Computer Engineering at the University of Massachusetts Amherst. Recently he also served as a program director at the National Science Foundation. Kundu began his career at IBM Research as a Research Staff Member; then worked at Intel Corporation as a Principal Engineer before joining UMass Amherst as a professor in 2005. He has published nearly 300 research papers in VLSI design and test, holds several key patents including ultra-drowsy sleep mode in processors, and has given more than a dozen tutorials at various conferences. He is a Fellow of the IEEE, Fellow of the Japan Society for Promotion of Science (JSPS), Senior International Scientist of the Chinese Academy of Sciences and was a Distinguished Visitor of the IEEE Computer Society. He has served as associate editor of a number of IEEE and ACM journals. He has been the Technical Program Chair/General Chair of multiple conferences including ICCD, ATS, ISVLSI, DFTS and VLSI Design.

Subscribe to VLSID 2023 Updates

© 2022-2023, VLSID. All Rights Reserved.