Copy of Course | nivdayan

CSC443H1

Database System Technology

Instructor: Niv Dayan

Lectures: Tuesday 2-4 pm, MS2170

Tutorials: Thursday, 3-4 pm, SS2110

TAs: Akshay Bapat & Pooyan Habibi

Office Hours: Tuesday & Thursday at 4 pm

Course Description

Database management systems are the bookkeepers of modern civilization. This course covers the algorithms and data structures that constitute the guts, bones, and arteries of these systems. Much of this course is about understanding the properties of modern hardware devices and designing data structures and access methods that optimize for them. We will also see that there is usually never one "correct" way for storing data: it is a question of managing trade-offs to optimize for what the user is trying to do. This course is important for anyone who intends to work with, tune, extend, or build data management systems.

Piazza

We will be using Piazza as our main discussion board. You are responsible for reading all postings made by me or the TAs. Please use Piazza to ask questions about assignments and course lecture materials so that everyone can benefit.

Contact

Course announcements will arrive through Quercus. Aside to that, this course website is required reading. It contains essential material and will be updated throughout the semester. Please use Piazza whenever possible to ask questions about course material. For personal questions, email me to nivdayan@gmail.com. Please include "csc443" in the subject line along with your full name. If you do not hear back quickly, we are always available during office hours to help.

Final Project

The course involves a major group programming assignment. as described in the following google doc. The cheatsheet is here. The grading sheet is here.

Final Exam

the final exam will be scheduled by Arts & Science and take place during the examination period.

Accessibility

The University of Toronto is committed to accessibility. If you require accommodations or have any accessibility concerns, please visit Accessibility Services as soon as possible.

Prerequisites

Students should have taken the courses listed in this calendar entry or have equivalent knowledge in algorithms, data structures, relational algebra, SQL, and operating systems. Hands-on experience with C/C++, Java or Python is also required.

Reading Material

The course textbook is "Database Management Systems" by Raghu Ramakrishnan and Johannes Gehrke, 3rd Edition. It is available new and used at the UofT bookstore and in the library. We are interested in Parts III to V. While this textbook is classic, it is also dated (20 years old). Therefore, the reading material will also draw from various sources, including research papers and articles, about state-of-the-art advances in the field.

Academic Integrity

Homeworks should be done individually and the work you submit must be your own. It is an academic offence to copy someone else's work. Whether you copy or let someone else copy, it is an offence. Academic offences are taken very seriously. At the same time, we want you to benefit from working with other students. It is appropriate to discuss course material related to homeworks, and we encourage you to do so.

Marking Scheme

There will be a midterm (20%), a final (40%) and a group project (40%) for this course.

Academic Integrity

The project hand-ins must be the group’s own work. It is an academic violation to copy code or experimental results from other groups, whether you copy yourself or let someone else copy. That said, we encourage you to discuss course material widely with your fellow students within and across groups.

Course schedule

01
We will first introduce the course and its logistics.

We will then take a deep dive into the topic of how to expand an array data structure when it runs out of space while managing a contention between write and space overheads. We discuss how for a contiguous array, it is desirable for the growth factor to be lower than the golden ratio to allow the memory allocator to reuse deallocated space. We then study how to employ a layer of indirection to overcome the cost contention between write and space costs, and we use CPU tricks to make the indirection layer efficient.

The slides are here.

The lecture video is here. Passcode: ?zPx.PCRp7

Papers covered:
Resizable Arrays in Optimal Time and Space. WADS 1999.

Additional Resources
https://github.com/facebook/folly/blob/main/folly/docs/FBVector.md
02
We will cover the analysis of Bloom filters from the ground up, and we will exlore recent optimizations to better tailor Bloom filters to modern hardware by making them cache-efficient and and parallel.

We'll also cover XOR filters as a more space-efficient counterpart to Bloom filters. We will see how to construct such a filter using "peeling", a general probabalistic technique to assign items to buckets while ensuring low collision probability.

The slides are here.

The lecture video is here. Passcode: 9AEmCVwRR+

Papers covered:
Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput - VLDB 2019
Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters - JEA 2020.

Additional Resources
For background on Bloom filters, please have a look at the lecture on Bloom filters on week 5 of CSC443 here.
03
We will then explore Quotient Filters, a recent type of filter that can support dynamic inserts and deletes. We will see how it achieves this by efficiently encoding its state using a small amount of metadata, and how it supports deletes using Robin Hood hashing.

Finally, we will see how to efficiently expand a quotient filter using InfiniFilter, a recent advancement that scaling performance and the false positive rate by supporting variable-sized fingerprints within slots.

The slides are here.

The lecture recording is here. Passcode: 0cgVWpq!RF

Core Papers Covered:
A General-Purpose Counting Filter: Making Every Bit Count - SIGMOD 2017
InfiniFilter: Expanding Filters to Infinity and Beyond - SIGMOD 2023

Optional:
Aleph Filter: To Infinity in Constant Time - VLDB 2024
04
This week we'll explore how to efficiently encode exact sets (a collection of some N entries out of a universe of U keys). We'll do so using Golomb coding to maximize space efficiency. We'll then cover Elias-Fano encoding and how to query this encoding for the existance of an element in constant time using rank and select.

The slides are here.

The video recording is here. Passcode: =Tt$#dcU=8

Class Reading
Golomb Codes An efficient coding scheme for integers. Link
Sorted Integers Compression with Elias-Fano Encoding. Link

Optional Further Reading:
Space-Efficient, High-Performance Rank & Select Structures on Uncompressed Bit Sequences. SEA 2013.
05
The problem we'll look at this week is how to store static sorted variable-length string keys and values in storage. Our case study will be RocksDB, an LSM-tree based database. Though this case study, we'll cover various compression techniques including delta encoding, LZ4, Huffman coding, dictionary training, etc.

The slides are here.

The lecture video is here. Passcode: FA7BX4#vAX

Background
A few of the lectures in CSC443 would serve good background for this lecture. In particular, it would help to watch lecture 1 on storage devices, lecture 2 on page formats (only the second part of the lecture), and lecture 4 on LSM-trees (only a basic understanding of the structure is needed).

Reading for Class
Optimizing Space Amplification in RocksDB, CIDR 2017.
06
07
08
This lecture will cover various index data structures for multi-dimensional data. We'll cover topics such as KD-trees, R-trees, and Z-ordering.

The slides are here.

The lecture video is here. Passcode: 2brCM4T=NG

Papers
Multidimensional Binary Search Trees Used for Associative Searching. Comms of the ACM 1975.
R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD 1984.
Integrating the UB-Tree into a Database System Kernel. VLDB 2000.
09
We'll cover classic buffer pool techniques including LRU-K and 2Q.

For this lecture, recommended background is lecture 2 on buffer management from CSC443. I also recommend catching up on min-heaps from lecture 6 of CSC443.

The slides are here.

The lecture video is here.
Passcode: 36vs.@Cq4b

Lecture papers
The LRU-K page replacement algorithm for database disk buffering. SIGMOD Record 1993.
2Q: a low overhead high performance bu er management replacement algorithm. VLDB 1994.
10
In previous courses, you have learned about balanced binary trees as good methods of indexing data in memory. As we'll see in this lecture, such methods aren't particularly well-optimized for modern CPUs and memory architectures. In this lecture, we'll cover various state-of-the-art in-memory indexes. We'll also cover methods that exploit the data distribution to improve on the worst-case bounds, including interpolation search and learned indexes.
The slides are here.

The presentation video is here. Passcode: !PmsjGd1x9

Papers Covered in Lecture
A study of index structures for main memory database management systems. 1985.
Cache conscious indexing for decision-support in main memory. VLDB 1999.
Making B+- trees cache conscious in main memory. SIGMOD 2000
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE 2013.
FITing-Tree: A Data-aware Index Structure. SIGMOD 2019.

You can find an analysis of interpolation search here.
11
This week we'll focus on how to achieve fast scans in column-stores via hardware-optimized designs. Recommended background is the lecture on column-stores from CSC443.

The lecture slides are here.

The lecture video is here. Passcode: G%.k.@i32G

The student presentation is here. Passcode: G%.k.@i32G

Papers
BitWeaving: Fast Scans for Main Memory Data Processing. SIGMOD 2013.
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. SIGMOD 2018.
Rethinking the Encoding of Integers for Scans on Skewed Data. SIGMOD 2024.
12
Visit by Vast Data
Speakers: Moshe Gabel and Vlad Zdornov

Title: Exascale LSMs at VAST Data

Abstract:
The VAST Data Platform (AI OS) is a unified platform for storing, querying, processing, streaming, enriching and indexing all types of structured and unstructured data. It is built on a novel disaggregated architecture (DASE), allowing it to scale to exabytes of capacity while maintaining exceptional performance, reliability and ease of operation. The VAST Data Platform comprises a multi-protocol all-Flash storage solution (VAST DataStore), a transactional, analytical and vector database coupled with an event streaming broker (VAST DataBase) and a pipeline orchestration infrastructure (VAST DataEngine). VAST AI OS now powers some of the world's most demanding workloads: AI training, financial forecasting, high performance scientific computing (including Canada Compute), academic research, and more.

This two-part talk will focus on the design and challenges of building the VAST DataBase: an LSM-based database with multi-version concurrency control that supports both analytical as well as vector workloads. Optimized for multi-PB sized tables with trillions of rows and vectors, the VAST DataBase provides high-performance during inserts, query and delete operations – with guaranteed transactional consistency, durability, isolation and availability.

We will first briefly present the DASE architecture and the different platform components, focusing on VAST DataStore and VAST DataBase. The main part of the talk will provide a detailed tour of our LSM variant, how it works within the VAST Database, and some of the unique challenges caused by our scale and our unique architecture -- which renders many common approaches infeasible.

About the company
VAST Data was founded in 2016 and is one of the fastest growing infrastructure companies in history. VAST Data is enabling the AI revolution with industry giants like CoreWeave and xAI being among its customers. The company has more than 1200 employees globally and a growing R&D center in the Toronto area. The Toronto branch operates as a startup-like organization developing the core of the database product and is engaged in the related research activities.

About the speakers
Vlad Zdornov has been with VAST Data since 2017, serving in different roles as an early-stage startup developer, architect and VP R&D. He holds a M.Sc. degree in Computer Engineering. Currently, Vlad is leading the VAST Data R&D center in Toronto. Dr. Moshe Gabel joined VAST Data from academia in 2025 as a senior software engineer. Before that, he was a professor of computer science and a researcher at University of Toronto and York University.
13
This week we'll cover range filters, which can tell whether a whole range of keys is empty or not to allow range queries to skip accessing storage.

The lecture slides on intro and SuRF are here.

The slides on Memento Filter etc are here.

The video lecture is here. Passcode: ^60.sACY@9

Reading
SuRF: Practical Range Query Filtering with Fast Succinct Tries. SIGMOD 2018.
Memento Filter: A Fast, Dynamic, and Robust Range Filter. SIGMOD 2025.
14
We'll first introduce the course and what's to come. Please read Chapter 1 in the textbook.

We'll then take a deep look at the memory hierarchy. As we will see throughout the course, the properties of the memory hierarchy give rise to how modern databases are architected.

For information on storage, read Chapter 8 Part 8.1, and Chapter 9 Parts 9.1 and 9.2.
As you will notice, the textbook was written before SSDs became popular. I have written some background on SSDs here. Please study this carefully. This will also be useful when we study circular logs.

You may also skim the following article.

The intro slides are here.

The storage slides are here.

The intro lecture video is here. Passcode: aX@53KDBN*

The slides on RAID are here.

The lecture video on RAID is here. Note that this is the video from last year used a backup. There was a glitch with recording the lecture this time, so we are using last year's recording. The material is identical, and there are only minor differences between the slides.
15
A table in a relational database contains data, but how do we store tables in storage devices such that we can scan them quickly and also insert, update and delete elements efficiently? This week we’ll learn how to efficiently store tables. We’ll also learn how to bring this data in and out of main memory efficiently for processing using buffer management techniques.
In the textbook, read all of Chapter 9.

The lecture slides are here.

The lecture video is here. Passcode: WpLS*3KF%h

The tutorial slides are here.

The tutorial video is here. Note that we had another glitch in recording, so we are giving you instead the recording of this tutorial from last year. The material is the same.
16
Some queries only need to access a small amount of data in a given table. To avoid having to scan the entire table, indexes are used to navigate to a given data entry quickly. This week, we'll learn about the numerous indexing options of database systems.

There is a lot of textbook reading for this week, so start early. Please read all of Chapter 8 as an introduction to indexing. Chapter 10 will provide an in-depth look at B-trees (you can pay less attention to Part 10.2 as this technology is largely obsolete, though it's still worth reading about to understand how technology evolves). Please also read Chapter 11 Parts 11.1 and 11.2. You are encouraged to read the rest of Chapter 11 as well, but we won't cover it in the course.

The slides are here.

The lecture from section 1 is here. Passcode: 3h?AazaM3!

The tutorial questions are here.

The tutorial question & answer slides are here.

The tutorial video is here. We had another recording glitch, so this video is from last year. The questions & answers are identical, though there are a few insignificant differences in wording and aesthetics.
17
Traditional indexing techniques are optimized for reading but slow at ingesting new data. This week, we'll examine other types of database indexes used to optimize for data ingestion.

Write-optimized indexes became popular over the past 15 years, so they do not appear in your textbook. Please read sections 1, 2, and 3 in the following: https://nivdayan.github.io/monkeykeyvaluestore.pdf

The lecture slides are here.

The lecture video is here. Passcode: #R7D7hhh=k

The tutorial question slides are here.

The tutorial question and answers are here.

The tutorial video is here. Passcode: 0.Lc=S1@Qx
18
We will begin this lecture by studying Bloom filters and their use in storage systems - especially with LSM-trees. We will then have a one hour research lecture covering advanced topics the co-optimization of LSM-trees and Bloom filters.

Required reading:
Section 1 in: https://www.eecs.harvard.edu/~michaelm/postscripts/tempim3.pdf (you may also skim section 2 for the mathematical analysis)
Sections 1-4 in https://nivdayan.github.io/monkeykeyvaluestore.pdf
Sections 1-4 in: https://nivdayan.github.io/dostoevsky.pdf

If you find this material to be interesting, you are also welcome to read the following papers, though we won't cover them in the course.
https://nivdayan.github.io/LSM-bush.pdf
https://nivdayan.github.io/chucky.pdf
https://nivdayan.github.io/spooky.pdf

The lecture slides are here.

The lecture video is here. Passcode: e#6L+1@.1a

The tutorial questions are here.

The tutorial answers are here. Please only look after class.
19
Sorting is a basic operation in database systems to construct indices or return sorted output for queries. You've probably encountered sorting algorithms like merge-sort and quick-sort in the past. However, it turns out these algorithms are no longer the best choice when most of our data reside in storage. This week, we'll learn about multi-way sort-merge, a sorting algorithm for vast amounts of data. Read Chapter 13 in the textbook.

The lecture slides are here.

The lecture video is here. Passcode: p8YjL^J1C2

The tutorial questions are here.

The tutorial slides are here.

The tutorial video is here. Passcode: Z*v4dddj+Z
20
21
22
23
While traditional databases store data in each table row by row, a newer breed of databases is storing the data column by column to optimize more for analytical queries and statistical calculations. This week, we'll learn about column-stores, the cornerstone of modern data warehousing.
As Column-Stores rose to prominence over the past 15 years, they do not appear in your textbook. Therefore, please have a look at "The Design and Implementation of Modern Column-Oriented Database Systems" by Abadi et al:

https://stratos.seas.harvard.edu/files/stratos/files/columnstoresfntdbs.pdf

Read Parts 1, 2, 3, 4.1, 4.2, 4.3, 4.4, 4.7, 4.8, and 5.

The slides are here.

The lecture video is here. Passcode: yM^uw!my7D

The tutorial questions are here.

The tutorial questions and answers are here.

The tutorial video is here. Passcode: Nxzi@4M#y%
24
This week, we'll take a look at an emerging KV-stores paradigm that involves logging data in storage and indexing it from memory. We’ll study how to implement this index efficiently using Cuckoo hashing and Cuckoo filters. We’ll also examine the importance of separating hot and cold data to lower garbage-collection overheads, and we’ll introduce the count-min sketch to help us achieve this. We will also discuss how to recover the index when power fails.

First, have a look at Bitcask, a very simple and easy to understand KV-store that indexes the full keys of data entries in storage
Then, have a look at Cuckoo hashing, a hash table design that’s able to achieve very good hash table utilization and CPU efficiency at the same time. Then, read up on Cuckoo filters, which store fingerprints rather than full keys within a cuckoo hash table to further reduce memory requirements. The wikipedia articles on these topics are also recommended.

Finally, have a look at FlashStore, a more memory-efficient KV-strore design than bitcask that stores fingerprints rather than full keys in memory and employs a Cuckoo filter variant to index them (since Cuckoo filters had still not been invented at the time).

Two KV-stores out in the wild that employ this paradigm are FASTER and ForestDB. Feel free to browse them.

The notion of hot/cold data identification and separation has been explored more in the context of SSDs and flash translation layers. The following paper, for instance, employs a similar construction to count-min.

The lecture slides are here.

The lecture video is here. Passcode: kyqaWJU4^y

The tutorial questions are here.

The tutorial video is here. Passcode: MJ@3WO+5ZU

The tutorial slides are here.
25
We'll introduce Transaction Management and Concurrency Control this week, the part of the database that ensures different operations can execute in parallel yet correctly, even if the system might fail at any moment.

Please read Chapters 16 and 17 in the textbook.

The lecture slides are here.

The lecture video is here. Passcode: QE79a&J%ox

The tutorial slides are here.

The tutorial video is here.
26

We hope this course will get you excited about research. For students who excel in this course and seek research opportunites, check out my home page and get in touch.

About Niv Dayan

Contact

Niv Dayan

CSC443H1

Database System Technology

Course Description

Piazza

Contact

Final Project

Final Exam

Accessibility

Prerequisites

Reading Material

Academic Integrity

Marking Scheme

Academic Integrity

Course schedule

Week 1 - Dynamic Arrays

Week 2 - Static Filters

Week 3 - Dynamic Filters

Week 4: Succinct Sets

Week 5: Storing Sorted Strings

Week 6: Practical Perfect Hashing

Week 7: Reading Week

Week 8: Multi-Dimensional Indexing

Week 9: Advanced Buffer Pools

Week 10: In-Memory Indexing

Week 12: Advanced Column-Stores

Week 11: Visit by Vast Data

Week 13: Range Filters

Week 1: Introduction & Storage

Week 2: Table and Buffer Management

Week 3: Indexing with Hash Tables and B-Trees

Week 4: Write-Optimized Indexing

Week 5: Bloom Filters & Research Lecture on LSM-Trees

Week 6: External Sorting

Week 7: Midterm

Week 8: Relational Operators & Query Optimization

Week 9: Reading Week - Rest & Study

Week 10: Column-Stores

Week 11: Circular Logs & Cuckoo Filters

Week 12: Transaction Management & Concurrency Control

Week 13: Recovery

We hope this course will get you excited about research. For students who excel in this course and seek research opportunites, check out my home page and get in touch.