NYSRG

A place for people to learn about computer systems together.

Computer systems are the building blocks of applications and the fabric that ties them together. (Databases, networks, virtualization, programming languages, compilers, distributed algorithms, optimizers, orchestrators, verifiers, libraries, …)

We meet weekly in NYC. Each month, we pick a different topic in computer systems, read about it, and work on projects in that area.

If you're not an expert, you can still come by! We welcome people from all backgrounds. Technology is used by everyone, and diverse experiences enrich communal discourse. The goal here is to learn, not to make commercial software.

Computers are pretty cool; let's explore!

A group of people sitting around a table, smiling and looking at a projector

Join NYSRG

Focus areas

NYSRG chooses a monthly focus area. Usually, this will be an unfamiliar but important topic adjacent to computer systems. We welcome developers and designers to read and collaborate on projects together over the course of the month.

Hopefully, you'll meet technologists of a common mind along the way.

We will read some of the SSA book. This is a book about construction, analysis, and code generation in compilers using static single assignment form, which is common in IRs like GCC, LLVM, XLA/HLO/MLIR, V8, and Java.

Static single assignment

SSA Book

“How I implement SSA form”

Special topics

From September 2023 to December 2024, NYSRG organized group discussions on a different topic each week. We still operate meetings on special topics occasionally, but we've since transitioned to a monthly project-led format.

Week 1
Sep 10, 2023

Week 1 — Sep 10, 2023

Writing git from scratch

Git is a version control system used by the most of the world’s software developers. What’s under the hood? Let’s get a glimpse into the workings of the “.git” folder, featuring content-addressed storage and hash trees.

Commits are snapshots, not diffs •Git's database internals I •Write yourself a Git

Week 2
Sep 17, 2023

Week 2 — Sep 17, 2023

Compiling mlc, the course

Compilers in machine learning are the silent toolchains that make compute possible at massive scale, on CPUs and on hardware accelerators like GPUs. How do they work? And what really goes into doing matrix multiplication fast?

Halide •PyTorch •Introduction to ML compilers •mlc.ai

Week 3
Sep 24, 2023

Week 3 — Sep 24, 2023

Perspectives on async

Cooperative and preemptive multitasking, schedulers, concurrency vs parallelism models, and how they influence language features. Case studies in Go and Rust internals, and mentions of Python, JavaScript, C#, Dart, and Lua.

What Color is Your Function? •Goroutines •Async/Await •Structured concurrency

Week 4
Oct 1, 2023

Week 4 — Oct 1, 2023

The database storage layer

Host: Mufeez Amjad and Rama Tadepalli

Storage is at the heart of databases, lying below execution and query planning, but above the file system, OS, and hardware. Let’s learn about them and get started with our own.

Disk-Oriented DBMS Overview •mmap •LSM Trees •Database Internals

Week 5
Oct 8, 2023

Week 5 — Oct 8, 2023

Web browser security

Host: Raghav Anand

Security is hard, and browsers are incredibly complex artifacts with tens of millions of lines of code. What could go wrong? From distributed systems security to process sandboxing, and from type confusion to Spectre.

Understanding The Web Security Model •Spectre.js •Site Isolation •CVE-2023-3420

Week 6
Oct 15, 2023

Week 6 — Oct 15, 2023

Build systems

Host: Val Kharitonov and Fang Shuo Deng

Build processes are complex and computationally intensive. How can we make builds fast, reproducible, and flexible, all while retaining simplicity? Maybe studying their underpinnings and reimplementing them can teach us a bit about computing.

Build Basics (Bazel) •Build Systems à la Carte

Week 7
Oct 22, 2023

Week 7 — Oct 22, 2023

Virtualization with KVM

Complex systems tend to produce more copies of themselves, and computers are no exception to this self-referential behavior. Let’s see how virtualization and emulation work on the OS and machine levels.

JSLinux •Virtio •kvm-ioctls •Book of crosvm •KVM/ARM

Week 8
Oct 29, 2023

Week 8 — Oct 29, 2023

Structured data encoding

Schema-based binary formats, and their associated languages, for specifying and serializing structured data. Design tradeoffs for RPCs, data archiving, OLAP, and embedded.

Protocol Buffers •Cap'n Proto •rkyv •Apache Arrow •Cornflakes

Week 9
Nov 5, 2023

Week 9 — Nov 5, 2023

What the k8s!

Kubernetes is, undeniably, tech’s favorite system for deploying code. It’s also crazy complex. Let’s look at it (a healthy serving of YAML!), then at Slurm (a scheduler for supercomputers from the 2000s), so we can discuss what the essential complexity is.

Kubernetes (O'Reilly) •R with k8s •Cluster Architecture •CRI Spec •SLURM

Week 10
Nov 12, 2023

Week 10 — Nov 12, 2023

Time series databases

Time-series databases index massive amounts of data. They’re the tools that let engineers understand and do more with systems. How do they work, and what are the key data structures and ideas in balancing their speed, storage, and cost?

InfluxDB •Prometheus •Gorilla (VLDB '15) •Monarch (VLDB '20)

Week 11
Nov 19, 2023

Week 11 — Nov 19, 2023

Make a plasmid viewer

You may have no idea what a plasmid is, me neither! It’s a circular DNA sequence. But treat this as a complex data visualization and UI exercise. We’ll each try to write an interactive SVG viewer, learning about libraries and reactive programming models.

SeqViz •Teselagen's Vector Editor

Week 12
Dec 3, 2023

Week 12 — Dec 3, 2023

The JVM specification

We’ll read the specification for the Java Virtual Machine, with a focus on Chapter 3 (compilation). Java is the most successful cross-platform compiled bytecode in the world.

The Java Virtual Machine Specification (Java SE 21)

Week 13
Jan 7, 2024

Week 13 — Jan 7, 2024

Compilers compiling compilers

Compilers compile code. But compilers are also code themselves. It’s 2024, and as we reflect on our past and present, let’s also reflect on some of the classic and modern takes on self-referentiality of compilers and staging.

Reflections on Trusting Trust •Futamura •Essence of Incremental Computation

Week 14
Jan 14, 2024

Week 14 — Jan 14, 2024

Compression with zstd

Exploring state-of-the-art lossless data compression. How do you pack big things in a small package, fast? (Note: This is a hard algorithm and we probably won’t make it all the way through in one sitting.)

LZ77 •Asymmetric Numeral Systems •Zstd blog post •Zstd spec •Brotli

Week 15
Jan 21, 2024

Week 15 — Jan 21, 2024

Media codecs

Still talking about compression like last week, but lossy codecs come with spicy cosine transforms, color spaces, and legal trickery. Video encoding is also absurdly complicated; who knew an algorithm could be broken up into 5000+ patents!

Web audio codec guide •Web video codec guide •PNG •JPEG •MP3 •VP8

Week 16
Feb 4, 2024

Week 16 — Feb 4, 2024

zstd spec, revisited

We’ll continue where we left off in Week 14 by understanding finite-state entropy in depth. Then, buckle down to read the actual zstd spec, followed by Brotli.

Finite State Entropy (series) •RFC 8878 (Zstd) •RFC 7932 (Brotli)

Week 17
Feb 11, 2024

Week 17 — Feb 11, 2024

Linux executables

What goes into executables and dynamic linking? How do they work, and how much of their functionality is engineered versus operating system magic? We’ll start reading a series by Amos Wenger.

Making our own executable packer (series)

Week 18
Feb 25, 2024

Week 18 — Feb 25, 2024

Zig compiler internals

A blog post by Mitchell Hashimoto, dissecting the Zig compiler. Zig is a self-hosted, low-level compiled systems programming language.

A half-hour to learn Zig •Zig compiler internals

Week 19
Mar 3, 2024

Week 19 — Mar 3, 2024

The WireGuard protocol

Host: Abel Matthew

The working operations of a secure VPN tunnel. Public-key cryptography and forward secrecy, parallelism techniques, and kernel networking.

WireGuard •Black Hat USA 2018

Week 20
Mar 17, 2024

Week 20 — Mar 17, 2024

String indexing algorithms

What ties together FASTA, ripgrep, bzip2, Prometheus, and Sublime Text? Algorithms and data structures for practical string indexing and search, with a healthy dose of practical automata theory.

Skew Algorithm (2003) •Index 1,600,000,000 Keys with Automata and Rust

Week 21
Mar 24, 2024

Week 21 — Mar 24, 2024

Distributed training

Training big ML models is the elephant in the room. So everyone starts by talking about compute, but it ends up being mostly about networking. Hello, data movement!

Allreduce (2017) •How to Train Really Large Models on Many GPUs? (2021) •Everything about Distributed Training and Efficient Finetuning (2024) •Tensor Parallelism with jax.pjit (2022)

Week 22
Mar 31, 2024

Week 22 — Mar 31, 2024

Linux kernel programming

We’ll read through a recent guide on Linux kernel module programming in C. (Before the meeting, get an x86-64 cloud VM with a fresh install of Ubuntu 22.04.)

The Linux Kernel Module Programming Guide

Week 23
Apr 14, 2024

Week 23 — Apr 14, 2024

io_uring

Reading about an up-and-coming Linux subsystem for high-performance async I/O. Thinking about memory access models, buffer ownership, and fault-tolerant parallelism.

Is there really no asynchronous block I/O on Linux? •Lord of the io_uring •Notes on io-uring •tokio-rs/io-uring

Week 24
Apr 21, 2024

Week 24 — Apr 21, 2024

Memory allocators

A discussion of general-purpose memory allocators. We’ll focus on the successful jemalloc, and the newer but promising mimalloc. Let’s peruse some source code if time permits.

jemalloc (2006) •mimalloc (2019) •std.heap.general_purpose_allocator

Week 25
Apr 28, 2024

Week 25 — Apr 28, 2024

AOSA Volume 1

Selected readings from The Architecture of Open Source Applications, Volume 1. How the software we know and love was designed, redesigned, and built.

Audacity •Bash •CMake •HDFS •LLVM •Riak and Erlang/OTP

Week 26
May 5, 2024

Week 26 — May 5, 2024

AOSA Volume 2, part 1

Selected readings from The Architecture of Open Source Applications, Volume 2.

matplotlib •PyPy •processing.js •The Glasgow Haskell Compiler

Week 27
May 12, 2024

Week 27 — May 12, 2024

The Ceph trilogy, parts 1+2

Host: Ori Bernstein

A trio of classic papers on building a distributed file system from the ground up for exabyte-scale storage. How the biggest organizations in the world keep track of data. Let’s start with the first two papers: CRUSH and RADOS.

CRUSH (2006) •RADOS (2007)

Week 28
May 19, 2024

Week 28 — May 19, 2024

The Ceph trilogy, part 3

Host: Ori Bernstein

We’ll continue where we left off from last week, reading the Ceph distributed file system paper from OSDI ’06. Ceph is a near-POSIX file system built on CRUSH and RADOS.

Ceph (2006)

Week 29
May 26, 2024

Week 29 — May 26, 2024

JAX from scratch

JAX is a differentiable programming language embedded in Python, which implements forward and reverse-mode automatic differentiation via functors and a tracing JIT. Come for a unique mix of vector calculus + category theory + compilers, as we make our own JAX.

JAX Quickstart •Pushforward and pullback •Autodidax: JAX core from scratch

Week 30
Jun 2, 2024

Week 30 — Jun 2, 2024

Code reading: Wasmi

What does it take to write a fast interpreter in 2024? We’ll read the source code of a recent and relatively small (~50,000 LoC) runtime for WebAssembly, with interesting tradeoffs between startup speed and performance.

Wasmi's new execution engine •Wasmi source •Wasmtime source

Week 31
Jun 9, 2024

Week 31 — Jun 9, 2024

disaggregated databases

Disaggregation is a technique to separate compute, storage, and memory needs in warehouse-scale computing. We’ll read about how people balance these features with the limitations of networked systems.

Tutorial: Disaggregated Database Systems (SIGMOD '23) •Aurora (SIGMOD '17) •Snowflake (NSDI '20)

Week 32
Jun 16, 2024

Week 32 — Jun 16, 2024

FoundationDB

FoundationDB is a distributed, transactional key-value store that underpins several new database systems. It claims to be strict serializable and lock-free, while having very strong failure tolerance. We’ll read the FoundationDB paper and some docs.

Blog Post •FoundationDB (SIGMOD '21) •Technical Overview

Week 33
Jun 23, 2024

Week 33 — Jun 23, 2024

Memory models

A series of three posts on how memory consistency is preserved by multicore processors: in hardware, in programming languages, and in specifically the Go programming language.

Memory Models (rsc)

Week 34
Jun 30, 2024

Week 34 — Jun 30, 2024

New sorting implementations

Sorting is one of the most common problems in computing. It’s also heavily optimized. Let’s look at research into two hand-tuned sorting implementations that are tailored for performance, which were recently merged into the Rust standard library.

driftsort •ipnsort •Sort safety •Branchless partitions

Week 35
Jul 7, 2024

Week 35 — Jul 7, 2024

Code reading: Redis

We’ll read the source code of Redis 1.3.6, the oldest tagged release (March 18, 2010). Redis is a cornerstone of modern systems, and its data structures power much of the Internet. But in 2010, Redis was mostly a single 9000-line C file written by one developer.

Redis 1.3.6

Week 36
Jul 28, 2024

Week 36 — Jul 28, 2024

Kernel instrumentation

The foundations of dynamic instrumentation in kernels. Case studies on DTrace (Solaris) and eBPF (Linux, Windows). Note that eBPF has a very large scope, we’ll discuss the groundwork rather than applications in this meeting.

Dynamic Instrumentation of Production Systems (2004) •What is eBPF? •PREVAIL: Understanding the Windows eBPF Verifier (2023)

Week 37
Aug 4, 2024

Week 37 — Aug 4, 2024

Intel SGX

Trusted computing: how can you possibly run sensitive code on a computer that’s been completely compromised? Obviously homomorphic encryption isn’t practical in 2024, so let’s try to learn a bit about how SGX promises this.

Intel SGX Explained

Week 38
Aug 11, 2024

Week 38 — Aug 11, 2024

Code reading: Solid.js

A study on reactive programming. Solid is a popular frontend framework for user interfaces. It’s known for fine-grained reactivity and minimal runtime overhead, being faster than React and Svelte. We’ll read the source code to see how it works (~6000 lines).

Solid Tutorial •Solid

Week 39
Sep 1, 2024

Week 39 — Sep 1, 2024

Garbage collection

We’ll read the literature about garbage collection algorithms for Java. While the JVM’s heap allocation interfaces have been largely the same for decades, its garbage collection algorithms have evolved to reflect changing needs.

Garbage-First Garbage Collection (2004) •Shenandoah GC (2016) •LXR (2022)

Week 40
Sep 8, 2024

Week 40 — Sep 8, 2024

Code reading: MoonRay

We’ll explore DreamWorks’ 3D renderer, used in films like How to Train Your Dragon. MoonRay (~650,000 LoC) is a state-of-the-art system with GPU-accelerated ray tracing, real-time distributed computation, denoising, and a huge number of materials and simulations.

MoonRay (SIGGRAPH '17) •Vectorized Production Path Tracing (HPG '17) •dreamworksanimation/openmoonray

Week 41
Sep 22, 2024

Week 41 — Sep 22, 2024

Embedded Rust

We will read about writing embedded firmware and drivers in Rust. I bought one STM32F3DISCOVERY kit (STM32F303VC MCU) for us to share, but feel free to bring your own hardware.

The Embedded Rust Book •Embassy •RTIC •Drivers in Rust

Week 42
Sep 29, 2024

Week 42 — Sep 29, 2024

Amazon's distributed storage

We’ll read some reflections from scientists on AWS S3, the oldest service of the world’s largest cloud provider. As of 2024, S3 stores over 350,000,000,000,000 objects (~100,000,000,000,000,000,000 bytes) with 99.999999999% durability.

Building and operating a pretty big storage system called S3 (2023) •Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 (SOSP '21)

Week 43
Oct 6, 2024

Week 43 — Oct 6, 2024

TAPL speedrun week 1

We’ll read Parts I–II of Types and Programming Languages by Benjamin Pierce, a classic book on type theory. If you’re like me and never formally studied this, let’s speedrun through it together. Feel free to start before the meeting if you want!

Types and Programming Languages

Week 44
Oct 13, 2024

Week 44 — Oct 13, 2024

TAPL speedrun week 2

We’ll read Parts III–IV (Recursive Types, Polymorphism) of Types and Programming Languages. Try to catch up to Chapter 14 before coming!

Types and Programming Languages

Week 45
Oct 20, 2024

Week 45 — Oct 20, 2024

TAPL speedrun week 3

We’ll read Parts V–VI (Polymorphism, Higher-Order Systems) of Types and Programming Languages, completing the book. Try to catch up to Chapter 28 before coming!

Types and Programming Languages

Week 46
Oct 27, 2024

Week 46 — Oct 27, 2024

50 years of SQL

SQL was introduced in 1974, so this year it turns 50. (Woah! It’s so old!) Let’s celebrate databases by reading an old paper from each decade, so we can reflect on how SQL has found and kept its place uncannily well as the world changes.

SEQUEL (1974) •Critique of SQL (1983) •Critique of SQL Isolation Levels (SIGMOD '95) •C-Store (VLDB '05) •Shark: SQL on Spark (SIGMOD '13)

Week 47
Nov 10, 2024

Week 47 — Nov 10, 2024

GPU sharing

Host: Rene Ravanan

We’ll discuss spatial and temporal sharing of GPUs. How can you run multiple applications on the same accelerator hardware?

GPU Sharing

Week 48
Nov 24, 2024

Week 48 — Nov 24, 2024

Code reading: Chalk

Chalk is an experimental system that implements the Rust trait system, based on logic programming. It currently powers rust-analyzer. We’ll read it as a case study on type system implementation.

Chalk book •Chalk source code

Week 49
Dec 8, 2024

Week 49 — Dec 8, 2024

Code reading: simdjson

Some people like to push things to their limit. We’ll read the (relatively short!) source code of simdjson, one of the fastest popular JSON parsers. We’ll learn a thing or two about SIMD, parsing, and performance optimization along the way.

Parsing Gigabytes of JSON per Second (VLDB '19) •simdjson source

Week 50
Dec 29, 2024

Week 50 — Dec 29, 2024

Eg-walker, hybrid CRDT

We’ll check out recent advancements in hybrid OT/CRDT algorithms for collaborative editing and offline synchronization. We’ll also read new research on problems such as interleaving, rich text, and tree CRDTs, which are summarized in blog posts by Loro.

Eg-walker •Introduction to Loro's Rich Text CRDT •Movable tree CRDTs

Week 51
Mar 2, 2025

Week 51 — Mar 2, 2025

Web bundlers

The most widely used compiler infrastructure in the world. How does Wasm work, or top-level await, tree shaking, even Tailwind — really, how can rich language tooling push the web forward?

esbuild architecture.md •esbuild Wasm Plugin •Lightning CSS •Rolldown (stages/)

Week 52
May 27, 2025

Week 52 — May 27, 2025

Code reading: Firecracker VMM

Micro-virtualization for serverless computing in AWS Lambda. Instant cold-starts while supporting a broad range of features with KVM, balancing security with performance.

Firecracker VMM source •Design document

Week 53
Jun 29, 2025

Week 53 — Jun 29, 2025

The new CPython JIT

Copy-and-patch compilation for speeding up high-level language bytecode. How are the tradeoffs versus existing JITs, and what does it mean to make Python fast?

Copy-and-Patch Compilation (OOPSLA '21) •Building a baseline JIT for Lua automatically •PEP 744

Organizers

This is being run by me, Eric! (@ekzhang1)

Before starting NYSRG, I also ran a similar community at Harvard in 2022–2023. I like learning about computers, and it's more fun when you're together with others.

NYSRG wouldn't be where it is today without you. Thank you for stopping by! A heart

Join NYSRG