An interactive set of Jupyter Notebooks which can be hosted using JupyterLite in order to learn Differential Privacy.

Overview of the module

Core Concepts: Differential Privacy provides a mathematically rigorous way to quantify privacy loss when sharing aggregate data. Intuitively, it ensures that the outcome of any analysis is nearly the same whether or not any one individual’s data is included. A differentially private algorithm injects noise (typically drawn from a Laplace or Gaussian distribution) into query results so that individual records cannot be inferred. Key concepts include the privacy budget (ε) which bounds how much information can leak, and mechanisms like the Laplace Mechanism for numeric queries and the Exponential Mechanism for categorical outputs. There are two main models: Global DP (noise added by a trusted server on aggregated data) and Local DP (noise added on each user’s device as adopted by Apple). Understanding composition theorems (how privacy degrades when combining multiple analyses) and the trade-off between privacy and accuracy is crucial.

Key Papers & Books:

  • Dwork et al., 2006“Calibrating Noise to Sensitivity in Private Data Analysis”, the paper that introduced differential privacy. It defines ε-DP formally and presents basic mechanisms.
  • Dwork & Roth, 2014The Algorithmic Foundations of Differential Privacy, a foundational textbook covering DP definitions, composition theorems, and advanced topics (free draft available online).
  • Cynthia Dwork’s MSR Technical Report (2008)“Differential Privacy”, a gentle introduction with examples.
  • USD Census Bureau, 2020 – Research papers on applying DP to the Census (real-world use case of DP at scale).

Practical Exercises:

  • Implement a simple DP mechanism: for example, write a function that answers counting queries on a dataset with Laplace noise added. Vary the value of ε to see the effect on accuracy vs. privacy.
  • Use an open-source DP library (e.g. Google’s DP library or IBM’s diffprivlib) to answer queries on a sample database (like movie ratings). Measure how often the true answer falls within the noisy confidence interval.
  • Reproduce a result from a DP research paper: e.g., implement the Report-Noisy-Max algorithm or the DP-SGD (stochastic gradient descent) algorithm for training a model with differential privacy.

Portfolio Project Ideas: Develop a toy “analytics platform” that collects data from users with privacy guarantees. For instance, create a web application that logs user interactions (clicks, keystrokes) and uses local differential privacy to send only noised data to the server. Showcase aggregate stats with and without DP to illustrate the privacy/utility trade-off. Another idea is to build a demonstration of the Netflix Prize de-anonymization issue and then apply DP to prevent such re-identification. Present your project with a blog post explaining how differential privacy protects users and share code on GitHub.

Open-Source Projects to Explore: Consider contributing to OpenDP (an open-source project by the academic community to build DP tools) or TensorFlow Privacy (a library for training machine learning models with DP). Another great project is Mozilla’s Python-DP (PyDP) which wraps Google’s DP C++ library – you can help by improving algorithms or documentation. These projects will expose you to real-world challenges in implementing DP and let you collaborate with experts.

Advanced Research Topics: If you aim to publish or dive deeper: explore privacy accounting techniques for complex pipelines (e.g., Rényi DP), or research domain-specific DP (applying DP to location data or graph data which may require new mechanisms). Another hot area is Differential Privacy in Deep Learning – improving the accuracy of DP-SGD training for deep neural networks (where current techniques degrade model accuracy significantly). There are open problems in designing better noise addition strategies that preserve more utility or using adaptive budgeting where the algorithm decides how to spend privacy budget dynamically. Publishing a paper on improving the privacy-utility tradeoff or combining DP with other PETs (like using DP and federated learning together) is a way to contribute to the state of the art.

Sources:

  1. Green, Matthew. “What is Differential Privacy?” - Cryptography Engineering Blog
  2. Apple Differential Privacy Overview – how Apple uses local DP for user data.

Updated: