Overview

Today, systems software runs on billions of devices across the globe, powering personal devices, cloud data centers, industrial machines, and healthcare equipment. As society becomes increasingly reliant on digital infrastructure, ensuring that computer systems are reliable and secure has become more important than ever.

This course focuses on understanding key challenges in building reliable and secure systems, and examines modern techniques to address them. Through reading and discussing research papers, as well as working on hands-on projects, students will learn how to analyze systems problems and explore ways to improve system reliability and security.

Prerequisite: COMP 530/730 (operating systems), COMP 734 (distributed systems), or equivalent background.



Course Info



Grading


Paper presentation

Each student will present and lead discussion for about several research papers over the semester. Assignments will be finalized by Week 2.

Presentations should be 30 minutes long, followed by a short Q&A. They should cover the paper's motivation, research question, key contributions, and main technical ideas. Presenters are also expected to respond to questions and help guide the discussion.

Students should create their own slides. Copying slides directly from the paper authors is not allowed and will affect the grade. It's acceptable to reuse figures or diagrams from the paper or its conference talk, as long as they are properly integrated.


Class participation

Participation is central to the course. Students are expected to attend all classes. One unexcused absence is allowed without prior notice; additional absences must be reported and approved.

Everyone is expected to engage actively during discussions---by asking questions, offering thoughts, or responding to others. The classroom should remain inclusive and respectful.


Research project

The course includes a semester-long team project on a topic related to system reliability or security. Projects are done in teams of two or three. Students who wish to work individually must request approval from the instructor. Teams should be formed by Week 2; the instructor will help with team formation if needed.

Each team will choose a topic, either from a list of suggestions (provided during the lecture) or based on their own ideas. All topics must be approved to ensure they are suitable in scope and relevance.


Project Milestones



Date Topic Detail
8/19 Course introduction
8/21 Systems, reliability, and security
8/26 Overview of static and dynamic analysis
8/28 Overview of symbolic execution, verification, and machine learning techniques
Deadline for team registration (8/29 12pm)
9/2 Individual Project Meetings
9/4 Empirical studies - An empirical study of operating systems errors, SOSP'01
- One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels, SOSP'23
9/9 Project proposal presentation - Project 1
- Project 2
- Project 3
- Project 4
9/11 - Project 5
- Project 6
- Project 7
Deadline for project proposal report
9/16 Empirical studies - Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems, OSDI'14
- Demystifying and Checking Silent Semantic Violations in Large Distributed Systems, OSDI'22
9/18 - Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics, ASPLOS'08
- A comprehensive study on deep learning bug characteristics, FES'19
9/23 Static analysis - LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO '04
- Path-Sensitive and Alias-Aware Typestate Analysis for Detecting OS Bugs, ASPLOS'22
9/25 - RacerX: Effective, Static Detection of Race Conditions and Deadlocks, SOSP'03
- Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis, CCS'19
9/30 Dynamic analysis - Efficient and Scalable Thread-Safety Violation Detection --- Finding thousands of concurrency bugs during testing, SOSP'19
- OZZ: Identifying Kernel Out-of-Order Concurrency Bugs with In-Vivo Memory Access Reordering, SOSP'24
10/2 - Automatic Reliability Testing for Cluster Management Controllers, OSDI'22
- Validating JIT Compilers via Compilation Space Exploration, SOSP'23
10/7 NO CLASS Well-being Day
10/9 Project midterm presentation - Project 5
- Project 2
- Project 3
- Project 4
10/14 - Project 1
- Project 6
- Project 7
10/16 NO CLASS Fall Break
10/21 Dynamic analysis - DeepXplore: Automated Whitebox Testing of Deep Learning Systems, SOSP'17
- Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks, OSDI'25
10/23 Symbolic execution - KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, OSDI'08
- Mousse: A System for Selective Symbolic Execution of Programs with Untamed Environments, EuroSys'20
10/28 - S2E: A Platform for In Vivo Multi-Path Analysis of Software Systems, ASPLOS'11
- Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution, OSDI'20
10/30 Verification - seL4: Formal Verification of an OS Kernel, SOSP'09
- Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval, SOSP'19
11/4 New system design - Operating System Support for Safe and Efficient Auxiliary Execution, OSDI'22
- RedLeaf: Isolation and Communication in a Safe Operating System, OSDI'20
11/6 - PVM: Efficient Shadow Paging for Deploying Secure Containers in Cloud-native Environment, SOSP'23
- BlackBox: A Container Security Monitor for Protecting Containers on Untrusted Operating Systems, OSDI'22
11/11 Machine learning for systems - Automatic Root Cause Analysis via Large Language Models for Cloud Incidents, EuroSys'24
- Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions, OSDI'20
11/13 - KernelGPT: Enhanced Kernel Fuzzing via Large Language Models, ASPLOS'25
- KNighter: Transforming Static Analysis with LLM-Synthesized Checkers, SOSP'25
11/18 Project final presentation - Project 1
- Project 2
- Project 3
- Project 4
11/20 - Project 5
- Project 6
- Project 7
11/25 NO CLASS Project report writing
Deadline for project final report
11/27 NO CLASS Thanksgiving Recess
12/2 Course review

Course Schedule