We continually invest in new research to advance innovations that preserve individual privacy while enabling valuable insights from data. Earlier this year, we launched Password Checkup, a Chrome extension that helps users detect if a username and password they enter on a website has been compromised. It relies on a cryptographic protocol known as private set intersection (PSI) to match your login’s credentials against an encrypted database of over 4 billion credentials Google knows to be unsafe. At the same time, it ensures that no one – including Google – ever learns your actual credentials.
Today, we’re rolling out the open-source availability of Private Join and Compute, a new type of secure multi-party computation (MPC) that augments the core PSI protocol to help organizations work together with confidential data sets while raising the bar for privacy.
Collaborating with data in privacy-safe ways
Many important research, business, and social questions can be answered by combining data sets from independent parties where each party holds their own information about a set of shared identifiers (e.g. email addresses), some of which are common. But when you’re working with sensitive data, how can one party gain aggregated insights about the other party’s data without either of them learning any information about individuals in the datasets? That’s the exact challenge that Private Join and Compute helps solve.
Using this cryptographic protocol, two parties can encrypt their identifiers and associated data, and then join them. They can then do certain types of calculations on the overlapping set of data to draw useful information from both datasets in aggregate. All inputs (identifiers and their associated data) remain fully encrypted and unreadable throughout the process. Neither party ever reveals their raw data, but they can still answer the questions at hand using the output of the computation. This end result is the only thing that’s decrypted and shared in the form of aggregated statistics. For example, this could be a count, sum, or average of the data in both sets.
A deeper look at the technology
Private Join and Compute combines two fundamental cryptographic techniques to protect individual data:
- Private set intersection allows two parties to privately join their sets and discover the identifiers they have in common. We use an oblivious variant which only marks encrypted identifiers without learning any of the identifiers.
- Homomorphic encryption allows certain types of computation to be performed directly on encrypted data without having to decrypt it first, which preserves the privacy of raw data. Throughout the process, individual identifiers and values remain concealed. For example, you can count how many identifiers are in the common set or compute the sum of values associated with marked encrypted identifiers – without learning anything about individuals.
This combination of techniques ensures that nothing but the size of the joined set and the statistics (e.g. sum) of its associated values is revealed. Individual items are strongly encrypted with random keys throughout and are not available in raw form to the other party or anyone else.
Watch this video or click to view the full infographic below on how Private Join and Compute works:
Using multi-party computation to solve real-world problems
Multi-party computation (MPC) is a field with a long history, but it has typically faced many hurdles to widespread adoption beyond academic communities. Common challenges include finding effective and efficient ways to tailor encryption techniques and tools to solve practical problems.
We’re committed to applying MPC and encryption technologies to more concrete, real-world issues at Google and beyond by making privacy technology more widely available. We are exploring a number of potential use cases at Google across collaborative machine learning, user security, and aggregated ads measurement.
And this is just the beginning of what’s possible. This technology can help advance valuable research in a wide array of fields that require organizations to work together without revealing anything about individuals represented in the data. For example:
- Public policy – if a government implements new wellness initiatives in public schools (e.g. better lunch options and physical education curriculums), what are the long-term health outcomes for impacted students?
- Diversity and inclusion – when industries create new programs to close gender and racial pay gaps, how does this impact compensation across companies by demographic?
- Healthcare – when a new preventative drug is prescribed to patients across the country, does it reduce the incidence of disease?
- Car safety standards – when auto manufacturers add more advanced safety features to vehicles, does it coincide with a decrease in reported car accidents?
Product Manager – Nirdhar Khazanie
Software Engineers – Mihaela Ion, Benjamin Kreuter, Erhan Nergiz, Quan Nguyen, and Karn Seth
Research Scientist – Mariana Raykova
Helping organizations do more without collecting more data