COMP 790-199 - Spring 2026

Overview

Today, computer systems operate at unprecedented scale, running on billions of devices worldwide and underpinning personal computing, cloud infrastructure, industrial platforms, and critical services. As these systems grow in complexity, traditional design and analysis techniques are increasingly strained, creating new challenges in performance, reliability, and security.

This course explores how machine learning techniques can be used to address fundamental challenges in computer systems. We will study state-of-the-art research that applies ML to systems problems such as performance optimization, bug detection, reliability analysis, and security enforcement. Through in-depth discussion of research papers and hands-on projects, students will learn how to frame systems challenges as learning problems, critically evaluate ML-based system designs, and develop practical ML-driven solutions for real-world systems.

Prerequisites: COMP 530/730 (operating systems), COMP 734 (distributed systems), COMP 562 (Intro Machine Learning), or equivalent background.

Course Info

Time: Tuesday and Thursday, 5:00-6:15 PM
Room: SN115
Instructor: Sishuai Gong
- Email: sishuai@cs.unc.edu
- Office hours: 30 minutes after class or by appointment

Grading

Paper presentation and review: 40%
Class participation: 30%
Research project: 30%

Paper presentation and review

Each student will present and lead the discussion for one or two research papers during the semester. In addition, each student will complete written reviews for five research papers. Paper assignments will be finalized by Week 2.

Each presentation should be approximately 30 minutes long, followed by a short Q&A and discussion. Presenters are expected to actively engage with the audience, respond to questions, and help guide the discussion. Students should create their own slides. Copying slides directly from the paper authors is not allowed and will affect the grade. It's acceptable to reuse figures or diagrams from the paper or talk.

Students will also write reviews for assigned papers. Reviews will be submitted through a Google Form, and detailed guidelines and evaluation criteria will be provided in advance. Reviews must be written independently by the student. The use of AI tools to generate or substantially draft reviews is strictly prohibited and will be treated as a violation of academic integrity.

Class participation

Participation is central to the course. Students are required to attend all classes. Absence of up to two classes is allowed without prior notice; additional absences must be reported and approved.

Everyone is expected to engage actively during discussions---by asking questions, offering thoughts, or responding to others. The classroom should remain inclusive and respectful.

Research project

The course includes a semester-long team project on a topic related to system reliability or security. Projects are done in teams of three or four. Students who wish to work individually must request approval from the instructor. Teams should be formed by Week 2; the instructor will help with team formation if needed.

Each team will choose a topic, either from a list of suggestions (provided during the lecture) or based on their own ideas. All topics must be approved to ensure they are suitable in scope and relevance.

Project Milestones

Team Formation
Teams of three or four must be in place by Week 2.

Project Proposal
Teams will first give a short in-class presentation to introduce their project idea. If the proposal is approved by the instructor, the team should upload a 2-page PDF proposal within one week of the presentation, outlining the project’s goals, background, and related work. If the proposal is not approved, the team should meet with the instructor during office hours to refine and finalize the topic before submission.

Midterm Review
Each team will give a brief mid-semester review, sharing early results and discussing any challenges.

Final Report and Presentation
The final deliverables include a 6-page PDF report, along with an in-class presentation.

Date	Topic	Detail
01/08	Lecture: Introduction
01/13	Lecture: Use machine learning to address kernel concurrency bugs
01/15	Lecture: Network performance
	Deadline for team registration (by 01/20)
01/20	Paper presentation	- Computers Can Learn from the Heuristic Designs and Master Internet Congestion Control, SIGCOMM'23 - Achieving Fairness Generalizability for Learning-based Congestion Control with Jury, EuroSys'25
01/22	NO CLASS	Hacker Day
01/27	NO CLASS	Winter Weather
01/29	Paper presentation (over Zoom)	- LiteFlow: towards high-performance adaptive neural networks for kernel datapath, SIGCOMM'22 - Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents, OSDI'24
02/03	Lecture: Resource management
02/05	Paper presentation	- SmartOS: Towards Automated Learning and User-Adaptive Resource Allocation in Operating Systems, APSys'21 - ALPS: An Adaptive Learning, Priority OS Scheduler for Serverless Functions, ATC'24
02/10	Project Proposal (Groups 1-3)
02/12	Project Proposal (Groups 4-6)
02/17	NO CLASS	Hacker Day
02/19	Paper presentation (over Zoom)	- SelfTune: Learning-based Cluster Managers, NSDI'23 - Towards VM Rescheduling Optimization Through Deep Reinforcement Learning, EuroSys'25
	Deadline for project proposal report (by 02/19)
02/24	Lecture: Data structure
02/26	Paper presentation	- The Case for Learned Index Structures, SIGMOD'18 - ALEX: An Updatable Adaptive Learned Index, SIGMOD'20
03/03	Paper presentation	- Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions, OSDI'20 - LOFT: A Lock-free and Adaptive Learned Index with High Scalability for Dynamic Workloads, EuroSys'25
03/05	Lecture: Bug detection & diagnosis
03/10	Paper presentation	- Unearthing Semantic Checks for Cloud Infrastructure-as-Code Programs, SOSP'24 - If At First You Don't Succeed, Try, Try, Again ...? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems, SOSP'24
03/12	Paper presentation	- SyzVegas: Beating Kernel Fuzzing Odds with Reinforcement Learning, Security'21 - KNighter: Transforming Static Analysis with LLM-Synthesized Checkers, SOSP'25
03/17	NO CLASS	Spring Break
03/19	NO CLASS	Spring Break
03/24	Mid-semester Presentation (Groups 4-6)
03/26	Mid-semester Presentation (Groups 1-3)
03/31	Paper presentation	- Automatic Root Cause Analysis via Large Language Models for Cloud Incidents, EuroSys'24 - Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks, ASPLOS'24
04/02	NO CLASS	Well-Being Day
04/07	Paper presentation	- Murphy: Performance Diagnosis of Distributed Cloud Applications, SIGCOMM'23 - Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices, ASPLOS'21
04/09	Lecture: ML integration
04/14	Paper presentation	- ChameleonAPI: Automatic and Efficient Customization of Neural Networks for ML Applications, OSDI'24 - Parrot: Efficient Serving of LLM-based Applications with Semantic Variable, OSDI'24
04/16	Paper presentation	- SuperFE: A Scalable and Flexible Feature Extractor for ML-based Traffic Analysis Applications, EuroSys'25 - Towards a Machine Learning-Assisted Kernel with LAKE, ASPLOS'23
04/21	Final Presentation (Groups 1-3)
04/23	Final Presentation (Groups 4-6)
	Deadline for project final report (by 04/28)