Overview
Today, systems software runs on billions of devices across the globe, powering personal devices, cloud data centers, industrial machines, and healthcare equipment. As society becomes increasingly reliant on digital infrastructure, ensuring that computer systems are reliable and secure has become more important than ever.
This course focuses on understanding key challenges in building reliable and secure systems, and examines modern techniques to address them. Through reading and discussing research papers, as well as working on hands-on projects, students will learn how to analyze systems problems and explore ways to improve system reliability and security.
Prerequisite: COMP 530/730 (operating systems), COMP 734 (distributed systems), or equivalent background.
Course Info
- Time: Tuesday and Thursday 12:30pm - 1:45pm
- Room: FB120
- Syllabus: Download
- Instructor: Sishuai Gong
- Email: sishuai@cs.unc.edu
- Office hours: Tuesday 1:45pm - 2:45pm
Grading
- Paper presentation: 30%
- Class participation: 30%
- Research project: 40%
Paper presentation
Each student will present and lead discussion for about several research papers over the semester. Assignments will be finalized by Week 2.
Presentations should be 30 minutes long, followed by a short Q&A. They should cover the paper's motivation, research question, key contributions, and main technical ideas. Presenters are also expected to respond to questions and help guide the discussion.
Students should create their own slides. Copying slides directly from the paper authors is not allowed and will affect the grade. It's acceptable to reuse figures or diagrams from the paper or its conference talk, as long as they are properly integrated.
Class participation
Participation is central to the course. Students are expected to attend all classes. One unexcused absence is allowed without prior notice; additional absences must be reported and approved.
Everyone is expected to engage actively during discussions---by asking questions, offering thoughts, or responding to others. The classroom should remain inclusive and respectful.
Research project
The course includes a semester-long team project on a topic related to system reliability or security. Projects are done in teams of two or three. Students who wish to work individually must request approval from the instructor. Teams should be formed by Week 2; the instructor will help with team formation if needed.
Each team will choose a topic, either from a list of suggestions (provided during the lecture) or based on their own ideas. All topics must be approved to ensure they are suitable in scope and relevance.
Project Milestones
- Team Formation
Teams of two or three must be in place by Week 2.- Topic Selection
Each team will meet with the instructor to discuss the proposed topic and confirm approval.- Project Proposal
A 2-page PDF proposal is due early in the semester, outlining the project's goals, background, related work. Teams will also give a short in-class presentation to introduce their project.- Midterm Review
Each team will give a brief mid-semester review, sharing early results and discussing any challenges.- Final Report and Presentation
The final deliverables include a 6-page PDF report, along with an in-class presentation.
Date | Topic | Detail |
---|---|---|
8/19 | Course introduction | |
8/21 | Systems, reliability, and security | |
8/26 | Overview of static and dynamic analysis | |
8/28 | Overview of symbolic execution, verification, and machine learning techniques | |
Deadline for team registration (8/29 12pm) | ||
9/2 | Individual Project Meetings | |
9/4 | Empirical studies | - An empirical study of operating systems errors, SOSP'01 - One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels, SOSP'23 |
9/9 | Project proposal presentation | - Project 1 - Project 2 - Project 3 - Project 4 |
9/11 | - Project 5 - Project 6 - Project 7 |
|
Deadline for project proposal report | ||
9/16 | Empirical studies | - Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in
Distributed Data-intensive Systems, OSDI'14 - Demystifying and Checking Silent Semantic Violations in Large Distributed Systems, OSDI'22 |
9/18 | - Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics,
ASPLOS'08 - A comprehensive study on deep learning bug characteristics, FES'19 |
|
9/23 | Static analysis | - LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO '04 - Path-Sensitive and Alias-Aware Typestate Analysis for Detecting OS Bugs, ASPLOS'22 |
9/25 | - RacerX: Effective, Static Detection of Race Conditions and Deadlocks, SOSP'03 - Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis, CCS'19 |
|
9/30 | Dynamic analysis | - Efficient and Scalable Thread-Safety Violation Detection --- Finding thousands of concurrency
bugs during testing, SOSP'19 - OZZ: Identifying Kernel Out-of-Order Concurrency Bugs with In-Vivo Memory Access Reordering, SOSP'24 |
10/2 | - Automatic Reliability Testing for Cluster Management Controllers, OSDI'22 - Validating JIT Compilers via Compilation Space Exploration, SOSP'23 |
|
10/7 | NO CLASS | Well-being Day |
10/9 | Project midterm presentation | - Project 5 - Project 2 - Project 3 - Project 4 |
10/14 | - Project 1 - Project 6 - Project 7 |
|
10/16 | NO CLASS | Fall Break |
10/21 | Dynamic analysis | - DeepXplore: Automated Whitebox Testing of Deep Learning Systems, SOSP'17 - Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks, OSDI'25 |
10/23 | Symbolic execution | - KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs,
OSDI'08 - Mousse: A System for Selective Symbolic Execution of Programs with Untamed Environments, EuroSys'20 |
10/28 | - S2E: A Platform for In Vivo Multi-Path Analysis of Software Systems, ASPLOS'11 - Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution, OSDI'20 |
|
10/30 | Verification | - seL4: Formal Verification of an OS Kernel, SOSP'09 - Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval, SOSP'19 |
11/4 | New system design | - Operating System Support for Safe and Efficient Auxiliary Execution, OSDI'22 - RedLeaf: Isolation and Communication in a Safe Operating System, OSDI'20 |
11/6 | - PVM: Efficient Shadow Paging for Deploying Secure Containers in Cloud-native Environment,
SOSP'23 - BlackBox: A Container Security Monitor for Protecting Containers on Untrusted Operating Systems, OSDI'22 |
|
11/11 | Machine learning for systems | - Automatic Root Cause Analysis via Large Language Models for Cloud Incidents, EuroSys'24 - Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions, OSDI'20 |
11/13 | - KernelGPT: Enhanced Kernel Fuzzing via Large Language Models, ASPLOS'25 - KNighter: Transforming Static Analysis with LLM-Synthesized Checkers, SOSP'25 |
|
11/18 | Project final presentation | - Project 1 - Project 2 - Project 3 - Project 4 |
11/20 | - Project 5 - Project 6 - Project 7 |
|
11/25 | NO CLASS | Project report writing |
Deadline for project final report | ||
11/27 | NO CLASS | Thanksgiving Recess |
12/2 | Course review |