A group of people sitting around a table, smiling and looking at a projector

Flavor

NY Systems Reading Group (NYSRG) is a place for people to learn about computer systems together.

Computer systems are the building blocks of applications and the fabric that ties them together. Databases, networks, programming languages, compilers, distributed coordination algorithms, optimizers, orchestrators, verifiers, libraries, …

NYSRG welcomes people from all backgrounds. We believe that diverse experiences enrich group discourse, and we try to find a pace suitable for everyone.

We typically read during the session. Non-reading time is dedicated to open group discussion: summary, interpretation, detailed review, criticism, and contextualization; to promote individual curiosity and understanding.

Our meetings are weekly on Sunday. You're not expected to have attended previous weeks.

Computers are pretty cool; let's explore!

Schedule

We curate for texts that are of broad interest to systems and application designers, while having excellent quality of prose and opportunities for hands-on learning.

(Not so serious though; computers should be fun, dang it!)

Week 1 — 09/10

Writing git from scratch

Git is a version control system used by the most of the world's software developers. What's under the hood? Let's get a glimpse into the workings of the .git folder, featuring content-addressed storage and hash trees.

Week 2 — 09/17

Compiling mlc, the course

Compilers in machine learning are the silent toolchains that make compute possible at massive scale, on CPUs and on hardware accelerators like GPUs. How do they work? And what really goes into doing matrix multiplication fast?

Week 3 — 09/24

Perspectives on async

Cooperative and preemptive multitasking, schedulers, concurrency vs parallelism models, and how they influence language features. Case studies in Go and Rust internals, and mentions of Python, JavaScript, C#, Dart, and Lua.

Week 4 — 10/01

The database storage layer

Host: Mufeez Amjad and Rama Tadepalli

Storage is at the heart of databases, lying below execution and query planning, but above the file system, OS, and hardware. Let's learn about them and get started with our own.

Week 5 — 10/08

Web browser security

Host: Raghav Anand

Security is hard, and browsers are incredibly complex artifacts with tens of millions of lines of code. What could go wrong? From distributed systems security to process sandboxing, and from type confusion to Spectre.

Week 6 — 10/15

Build systems

Host: Val Kharitonov and Fang Shuo Deng

Build processes are complex and computationally intensive. How can we make builds fast, reproducible, and flexible, all while retaining simplicity? Maybe studying their underpinnings and reimplementing them can teach us a bit about computing.

Week 7 — 10/22

Virtualization with KVM

Complex systems tend to produce more copies of themselves, and computers are no exception to this self-referential behavior. Let's see how virtualization and emulation work on the OS and machine levels.

Week 8 — 10/29

Structured data encoding

Schema-based binary formats, and their associated languages, for specifying and serializing structured data. Design tradeoffs for RPCs, data archiving, OLAP, and embedded.

Week 9 — 11/05

What the k8s!

Kubernetes is, undeniably, tech's favorite system for deploying code. It's also crazy complex. Let's look at it (a healthy serving of YAML!), then at Slurm (a scheduler for supercomputers from the 2000s), so we can discuss what the essential complexity is.

Week 10 — 11/12

Time series databases

Time-series databases index massive amounts of data. They're the tools that let engineers understand and do more with systems. How do they work, and what are the key data structures and ideas in balancing their speed, storage, and cost?

Week 11 — 11/19

Make a plasmid viewer

You may have no idea what a plasmid is, me neither! It's a circular DNA sequence. But treat this as a complex data visualization and UI exercise. We'll each try to write an interactive SVG viewer, learning about libraries and reactive programming models.

Week 12 — 12/03

The JVM specification

We'll read the specification for the Java Virtual Machine, with a focus on Chapter 3 (compilation). Java is the most successful cross-platform compiled bytecode in the world.

Week 13 — 01/07

Compilers compiling compilers

Compilers compile code. But compilers are also code themselves. It's 2024, and as we reflect on our past and present, let's also reflect on some of the classic and modern takes on self-referentiality of compilers and staging.

Week 14 — 01/14

Compression with zstd

Exploring state-of-the-art lossless data compression. How do you pack big things in a small package, fast? (Note: This is a hard algorithm and we probably won't make it all the way through in one sitting.)

Week 15 — 01/21

Media codecs

Still talking about compression like last week, but lossy codecs come with spicy cosine transforms, color spaces, and legal trickery. Video encoding is also absurdly complicated; who knew an algorithm could be broken up into 5000+ patents!

Week 16 — 02/04

zstd spec, revisited

We'll continue where we left off in Week 14 by understanding finite-state entropy in depth. Then, buckle down to read the actual zstd spec, followed by Brotli.

Week 17 — 02/11

Linux executables

What goes into executables and dynamic linking? How do they work, and how much of their functionality is engineered versus operating system magic? We'll start reading a series by Amos Wenger.

Week 18 — 02/25

Zig compiler internals

A blog post by Mitchell Hashimoto, dissecting the Zig compiler. Zig is a self-hosted, low-level compiled systems programming language.

Week 19 — 03/03

The WireGuard protocol

Host: Abel Matthew

The working operations of a secure VPN tunnel. Public-key cryptography and forward secrecy, parallelism techniques, and kernel networking.

Week 20 — 03/17

String indexing algorithms

What ties together FASTA, ripgrep, bzip2, Prometheus, and Sublime Text? Algorithms and data structures for practical string indexing and search, with a healthy dose of practical automata theory.

Week 21 — 03/24

Distributed training

Training big ML models is the elephant in the room. So everyone starts by talking about compute, but it ends up being mostly about networking. Hello, data movement!

Week 22 — 03/31

Linux kernel programming

We'll read through a recent guide on Linux kernel module programming in C. (Before the meeting, get an x86-64 cloud VM with a fresh install of Ubuntu 22.04.)

Week 23 — 04/14

io_uring

Reading about an up-and-coming Linux subsystem for high-performance async I/O. Thinking about memory access models, buffer ownership, and fault-tolerant parallelism.

Week 24 — 04/21

Memory allocators

A discussion of general-purpose memory allocators. We'll focus on the successful jemalloc, and the newer but promising mimalloc. Let's peruse some source code if time permits.

Week 25 — 04/28

AOSA Volume 1

Selected readings from The Architecture of Open Source Applications, Volume 1. How the software we know and love was designed, redesigned, and built.

Week 26 — 05/05

AOSA Volume 2, part 1

Selected readings from The Architecture of Open Source Applications, Volume 2.

Week 27 — 05/12

The Ceph trilogy, parts 1+2

Host: Ori Bernstein

A trio of classic papers on building a distributed file system from the ground up for exabyte-scale storage. How the biggest organizations in the world keep track of data. Let's start with the first two papers: CRUSH and RADOS.

Week 28 — 05/19

The Ceph trilogy, part 3

Host: Ori Bernstein

We'll continue where we left off from last week, reading the Ceph distributed file system paper from OSDI '06. Ceph is a near-POSIX file system built on CRUSH and RADOS.

Organizers

This is being run by me, Eric! (Twitter: @ekzhang1)

I previously ran a similar reading group at Harvard for a year.