Towards Principled Error-Efficient Systems
Sarita Adve is the Richard T. Cheng Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests span the system stack, ranging from hardware to applications. Her early work on data-race-free memory consistency models led to the memory models for the Java and C++ programming languages and forms the foundation for memory models used in most hardware and software systems today. She is also known for her work on heterogeneous computing and software-driven approaches for hardware resiliency. She is a member of the American Academy of Arts and Sciences, a fellow of the ACM and IEEE, and a recipient of the ACM/IEEE-CS Ken Kennedy award, the Anita Borg Institute Women of Vision award in innovation, the ACM SIGARCH Maurice Wilkes award, and the University of Illinois campus award for excellence in graduate student mentoring. As ACM SIGARCH chair, she co-founded the CARES movement, winner of the CRA distinguished service award, to address discrimination and harassment in Computer Science research events. She received her PhD from the University of Wisconsin-Madison and her B.Tech. from the Indian Institute of Technology, Bombay.
Sources of errors in computer systems are increasing, ranging from unintentional transient errors due to high energy particle strikes to deliberately induced errors from approximations for lower energy and/or higher performance. Traditional error resiliency techniques that mitigate all errors and purely in hardware can be unnecessarily expensive. We use the term “error efficiency” for a paradigm that allows a controlled set of errors that are acceptable to the end user’s quality of experience while maximizing efficiency metrics related to performance, power, and/or area. Error efficient “systems” provide error efficiency through co-designed layers of the system stack, including hardware, system software, and the application.
Although error-efficient systems can provide orders of magnitude benefits, there does not exist a discipline to design such systems. This talk will discuss recent work towards such a discipline, including (1) program analysis based techniques to understand the impact of hardware errors on software; (2) extending the discipline of software engineering to account for hardware errors along with software bugs, and (3) domain-specific error efficiency techniques that match the emerging era of domain-specific hardware. These ideas lay the foundation for a new era of principled error-efficient system design.