publications | Pascal Berrang

2025

PoPETs
SoK: Descriptive Statistics Under Local Differential Privacy

René Raab, Pascal Berrang, Paul Gerhart, and 1 more author

Proceedings on Privacy Enhancing Technologies (PoPETs), 2025

Abs Bib PDF

Local Differential Privacy (LDP) provides a formal guarantee of privacy that enables the collection and analysis of sensitive data without revealing any individual’s data. While LDP methods have been extensively studied, there is a lack of a systematic and empirical comparison of LDP methods for descriptive statistics. In this paper, we first provide a systematization of LDP methods for descriptive statistics, comparing their properties and requirements. We demonstrate that several mean estimation methods based on sampling from a Bernoulli distribution are equivalent in the one-dimensional case and introduce methods for variance estimation. We then empirically compare methods for mean, variance, and frequency estimation. Finally, we provide recommendations for the use of LDP methods for descriptive statistics and discuss their limitations and open questions.
@article{raab24descriptive, author = {Raab, René and Berrang, Pascal and Gerhart, Paul and Schröder, Dominique}, title = {SoK: Descriptive Statistics Under Local Differential Privacy}, journal = {Proceedings on Privacy Enhancing Technologies (PoPETs)}, volume = {2025}, number = {1}, year = {2025}, }

2024

USENIX
Quantifying Privacy Risks of Prompts in Visual Prompt Learning

Yixin Wu, Rui Wen, Michael Backes, and 4 more authors

In Proceedings of the 33rd USENIX Security Symposium (Security), 2024

Abs arXiv Bib PDF

Large-scale pre-trained models are increasingly adapted to downstream tasks through a new paradigm called prompt learning. In contrast to fine-tuning, prompt learning does not update the pre-trained model’s parameters. Instead, it only learns an input perturbation, namely prompt, to be added to the downstream task data for predictions. Given the fast development of prompt learning, a well-generalized prompt inevitably becomes a valuable asset as significant effort and proprietary data are used to create it. This naturally raises the question of whether a prompt may leak the proprietary information of its training data. In this paper, we perform the first comprehensive privacy assessment of prompts learned by visual prompt learning through the lens of property inference and membership inference attacks. Our empirical evaluation shows that the prompts are vulnerable to both attacks. We also demonstrate that the adversary can mount a successful property inference attack with limited cost. Moreover, we show that membership inference attacks against prompts can be successful with relaxed adversarial assumptions. We further make some initial investigations on the defenses and observe that our method can mitigate the membership inference attacks with a decent utility-defense trade-off but fails to defend against property inference attacks. We hope our results can shed light on the privacy risks of the popular prompt learning paradigm. To facilitate the research in this direction, we will share our code and models with the community.
@inproceedings{wu2024quantifying, title = {Quantifying Privacy Risks of Prompts in Visual Prompt Learning}, author = {Wu, Yixin and Wen, Rui and Backes, Michael and Berrang, Pascal and Humbert, Mathias and Shen, Yun and Zhang, Yang}, booktitle = {Proceedings of the 33rd USENIX Security Symposium (Security)}, year = {2024}, publisher = {USENIX Association}, }
PoPETs
Link Stealing Attacks Against Inductive Graph Neural Networks

Yixin Wu, Xinlei He, Pascal Berrang, and 4 more authors

Proceedings on Privacy Enhancing Technologies (PoPETs), 2024

Abs Bib PDF

A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained model can be generalized to new nodes/graphs. Due to its flexibility, the inductive setting is the most popular GNN setting at the moment. Previous work has shown that transductive GNNs are vulnerable to a series of privacy attacks. However, a comprehensive privacy analysis of inductive GNN models is still missing. This paper fills the gap by conducting a systematic privacy analysis of inductive GNNs through the lens of link stealing attacks, one of the most popular attacks that are specifically designed for GNNs. We propose two types of link stealing attacks, i.e., posterior-only attacks and combined attacks. We define threat models of the posterior-only attacks with respect to node topology and the combined attacks by considering combinations of posteriors, node attributes, and graph features. Extensive evaluation on six real-world datasets demonstrates that inductive GNNs leak rich information that enables link stealing attacks with advantageous properties. Even attacks with no knowledge about graph structures can be effective. We also show that our attacks are robust to different node similarities and different graph features. As a counterpart, we investigate two possible defenses and discover they are ineffective against our attacks, which calls for more effective defenses.
@article{wu24link, author = {Wu, Yixin and He, Xinlei and Berrang, Pascal and Humbert, Mathias and Backes, Michael and Gong, Neil and Zhang, Yang}, title = {Link Stealing Attacks Against Inductive Graph Neural Networks}, journal = {Proceedings on Privacy Enhancing Technologies (PoPETs)}, volume = {2024}, number = {1}, year = {2024}, }
PoPETs
Measuring Conditional Anonymity—A Global Study

Pascal Berrang, Paul Gerhart, and Dominique Schröder

Proceedings on Privacy Enhancing Technologies (PoPETs), 2024

Abs Bib PDF

The realm of digital health is experiencing a global surge, with mobile applications extending their reach into various facets of daily life. From tracking daily eating habits and vital functions to monitoring sleep patterns and even the menstrual cycle, these apps have become ubiquitous in their pursuit of comprehensive health insights. Many of these apps collect sensitive data and promise users to protect their privacy – often through pseudonymization. We analyze the real anonymity that users can expect by this approach and report on our findings. More concretely: 1. We introduce the notion of conditional anonymity sets derived from statistical properties of the population. 2. We measure anonymity sets for two real-world applications and present overarching findings from 39 countries. 3. We develop a graphical tool for people to explore their own anonymity set. One of our case studies is a popular app for tracking the menstruation cycle. Our findings for this app show that, despite their promise to protect privacy, the collected data can be used to identify users up to groups of 5 people in 97% of all the US counties, allowing the de-anonymization of the individuals. Given that the US Supreme Court recently overturned abortion rights, the possibility of determining individuals is a calamity.
@article{berrang24measuring, author = {Berrang, Pascal and Gerhart, Paul and Schröder, Dominique}, title = {Measuring Conditional Anonymity---A Global Study}, journal = {Proceedings on Privacy Enhancing Technologies (PoPETs)}, volume = {2024}, number = {1}, year = {2024}, }

2023

NDSS
Accountable Javascript Code Delivery

Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, and 1 more author

In Proceedings of the 30th Annual Network and Distributed System Security Symposium (NDSS), 2023

Abs arXiv Bib

The internet is a major distribution platform for web applications, but there are no effective transparency and audit mechanisms in place for the web. Due to the ephemeral nature of web applications, a client visiting a website has no guarantee that the code it receives today is the same as yesterday, or the same as other visitors receive. Despite advances in web security, it is thus challenging to audit web applications before they are rendered in the browser. We propose Accountable JS, a browser extension and opt in protocol for accountable delivery of active content on a web page. We prototype our protocol, formally model its security properties with the Tamarin Prover, and evaluate its compatibility and performance impact with case studies including WhatsApp Web, AdSense and Nimiq. Accountability is beginning to be deployed at scale, with Meta’s recent announcement of Code Verify available to all 2 billion WhatsApp users, but there has been little formal analysis of such protocols. We formally model Code Verify using the Tamarin Prover and compare its properties to our Accountable JS protocol. We also compare Code Verify’s and Accountable JS extension’s performance impacts on WhatsApp Web.
@inproceedings{esiyok2023accountable, title = {Accountable Javascript Code Delivery}, author = {Esiyok, Ilkan and Berrang, Pascal and Cohn-Gordon, Katriel and Kuennemann, Robert}, year = {2023}, booktitle = {Proceedings of the 30th Annual Network and Distributed System Security Symposium (NDSS)}, }
WWW
On How Zero-Knowledge Proof Blockchain Mixers Improve, and Worsen User Privacy

Zhipeng Wang, Stefanos Chaliasos, Kaihua Qin, and 5 more authors

In Proceedings of the ACM Web Conference 2023, 2023

Abs arXiv Bib

One of the most prominent and widely-used blockchain privacy solutions are zero-knowledge proof (ZKP) mixers operating on top of smart contract-enabled blockchains. ZKP mixers typically advertise their level of privacy through a so-called anonymity set size, similar to k-anonymity, where a user hides among a set of k other users. In reality, however, these anonymity set claims are mostly inaccurate, as we find through empirical measurements of the currently most active ZKP mixers. We propose five heuristics that, in combination, can increase the probability that an adversary links a withdrawer to the correct depositor on average by 51.94% (108.63%) on the most popular Ethereum (ETH) and Binance Smart Chain (BSC) mixer, respectively. Our empirical evidence is hence also the first to suggest a differing privacy-predilection of users on ETH and BSC. We further identify 105 Decentralized Finance (DeFi) attackers leveraging ZKP mixers as the initial funds and to deposit attack revenue (e.g., from phishing scams, hacking centralized exchanges, and blockchain project attacks).

State-of-the-art mixers are moreover tightly intertwined with the growing DeFi ecosystem by offering "anonymity mining" (AM) incentives, i.e., mixer users receive monetary rewards for mixing coins. However, contrary to the claims of related work, we find that AM does not always contribute to improving the quality of an anonymity set size of a mixer, because AM tends to attract privacy-ignorant users naively reusing addresses.
@inproceedings{wang2023zero, title = {On How Zero-Knowledge Proof Blockchain Mixers Improve, and Worsen User Privacy}, author = {Wang, Zhipeng and Chaliasos, Stefanos and Qin, Kaihua and Zhou, Liyi and Gao, Lifeng and Berrang, Pascal and Livshits, Ben and Gervais, Arthur}, booktitle = {Proceedings of the ACM Web Conference 2023}, year = {2023}, }
ICML
Data Poisoning Attacks Against Multimodal Encoders

Ziqing Yang, Xinlei He, Zheng Li, and 4 more authors

In 40th International Conference on Machine Learning (ICML), 2023

Abs arXiv Bib

Traditional machine learning (ML) models usually rely on large-scale labeled datasets to achieve strong performance. However, such labeled datasets are often challenging and expensive to obtain. Also, the predefined categories limit the model’s ability to generalize to other visual concepts as additional labeled data is required. On the contrary, the newly emerged multimodal model, which contains both visual and linguistic modalities, learns the concept of images from the raw text. It is a promising way to solve the above problems as it can use easy-to-collect image-text pairs to construct the training dataset and the raw texts contain almost unlimited categories according to their semantics. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model’s training dataset to trigger malicious behaviors in it. Previous work mainly focuses on the visual modality. In this paper, we instead focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we conduct three types of poisoning attacks against CLIP, the most representative multimodal contrastive learning framework. Extensive evaluations on different datasets and model architectures show that all three attacks can perform well on the linguistic modality with only a relatively low poisoning rate and limited epochs. Also, we observe that the poisoning effect differs between different modalities, i.e., with lower MinRank in the visual modality and with higher Hit@K when K is small in the linguistic modality. To mitigate the attacks, we propose both pre-training and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model’s utility.
@inproceedings{yang2023data, author = {Yang, Ziqing and He, Xinlei and Li, Zheng and Backes, Michael and Humbert, Mathias and Berrang, Pascal and Zhang, Yang}, booktitle = {40th International Conference on Machine Learning (ICML)}, title = {Data Poisoning Attacks Against Multimodal Encoders}, year = {2023}, }

2022

ESORICS
A framework for constructing Single Secret Leader Election from MPC

Michael Backes, Pascal Berrang, Lucjan Hanzlik, and 1 more author

In 27th European Symposium on Research in Computer Security (ESORICS) 2022, 2022

Abs Bib PDF

The emergence of distributed digital currencies has raised the need for a reliable consensus mechanism. In proof-of-stake cryptocurrencies, the participants periodically choose a closed set of validators, who can vote and append transactions to the blockchain. Each validator can become a leader with the probability proportional to its stake. Keeping the leader private yet unique until it publishes a new block can significantly reduce the attack vector of an adversary and improve the throughput of the network. The problem of Single Secret Leader Election (SSLE) was first formally defined by Boneh et al. in 2020.

In this work, we propose a novel framework for constructing SSLE protocols, which relies on secure multi-party computation (MPC) and satisfies the desired security properties. Our framework does not use any shuffle or sort operations and has a computational cost for $N parties as low as O(N) of basic MPC operations per party. We improve the state-of-the-art for SSLE protocols that do not assume a trusted setup. Moreover, our SSLE scheme efficiently handles weighted elections. That is, for a total weight S of N parties, the associated costs are only increased by a factor of \logS. When the MPC layer is instantiated with techniques based on Shamir’s secret-sharing, our SSLE has a communication cost of O(N^2) which is spread over O(\logN) rounds, can tolerate up to t<N/2 of faulty nodes without restarting the protocol, and its security relies on DDH in the random oracle model. When the MPC layer is instantiated with more efficient techniques based on garbled circuits, our SSLE requires all parties to participate, up to N-1$ of which can be malicious, and its security is based on the random oracle model.
@inproceedings{backes2022ssle, author = {Backes, Michael and Berrang, Pascal and Hanzlik, Lucjan and Pryvalov, Ivan}, booktitle = {27th European Symposium on Research in Computer Security (ESORICS) 2022}, title = {A framework for constructing Single Secret Leader Election from MPC}, year = {2022}, }
arXiv
Fine-Tuning Is All You Need to Mitigate Backdoor Attacks

Zeyang Sha, Xinlei He, Pascal Berrang, and 2 more authors

arXiv preprint arXiv:2212.09067, 2022

Abs arXiv Bib

Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models’ utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models’ backdoor vulnerabilities.
@article{sha2022finetuning, author = {Sha, Zeyang and He, Xinlei and Berrang, Pascal and Humbert, Mathias and Zhang, Yang}, journal = {arXiv preprint arXiv:2212.09067}, title = {Fine-Tuning Is All You Need to Mitigate Backdoor Attacks}, year = {2022}, }

2020

EuroS&P
Membership Inference Against DNA Methylation Databases

Inken Hagestedt, Mathias Humbert, Pascal Berrang, and 4 more authors

In Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), 2020

Abs Bib PDF

Biomedical data sharing is one of the key elements fostering the advancement of biomedical research but poses severe risks towards the privacy of individuals contributing their data, as already demonstrated for genomic data. In this paper, we study whether and to which extent DNA methylation data, one of the most important epigenetic elements regulating human health, is prone to membership inference attacks, a critical type of attack that reveals an individual’s participation in a given database. We design and evaluate three different attacks exploiting published summary statistics, among which one is based on machine learning and another is exploiting the dependencies between genome and methylation data. Our extensive evaluation on six datasets containing a diverse set of tissues and diseases collected from more than 1,300 individuals in total shows that such membership inference attacks are effective, even when the target’s methylation profile is not accessible. It further shows that the machine-learning approach outperforms the statistical attacks, and that learned models are transferable across different datasets.
@inproceedings{hagestedt2020membership, title = {Membership Inference Against DNA Methylation Databases}, author = {Hagestedt, Inken and Humbert, Mathias and Berrang, Pascal and Lehmann, Irina and Eils, Roland and Backes, Michael and Zhang, Yang}, booktitle = {Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS\&P)}, year = {2020}, organization = {IEEE}, }

2019

NDSS
ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

Ahmed Salem, Yang Zhang, Mathias Humbert, and 3 more authors

In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS), 2019

Abs Bib PDF Code

Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications.

However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary, such as using multiple so-called shadow models, knowledge of the target model structure, and having a dataset from the same distribution as the target model’s training data. We relax all these key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains. In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
@inproceedings{mlleaks, title = {ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models}, author = {Salem, Ahmed and Zhang, Yang and Humbert, Mathias and Berrang, Pascal and Fritz, Mario and Backes, Michael}, year = {2019}, booktitle = {Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS)}, }
NDSS
MBeacon: Privacy-Preserving Beacons for DNA Methylation Data

Inken Hagestedt, Yang Zhang, Mathias Humbert, and 4 more authors

In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS), 2019

Awarded Abs Bib PDF

Best Paper Award

The advancement of molecular profiling techniques fuels biomedical research with a deluge of data. To facilitate data sharing, the Global Alliance for Genomics and Health established the Beacon system, a search engine designed to help researchers find datasets of interest. While the current Beacon system only supports genomic data, other types of biomedical data, such as DNA methylation, are also essential for advancing our understanding in the field. In this paper, we propose the first Beacon system for DNA methylation data sharing: MBeacon. As the current genomic Beacon is vulnerable to privacy attacks, such as membership inference, and DNA methylation data is highly sensitive, we take a privacy-by-design approach to construct MBeacon.

First, we demonstrate the privacy threat, by proposing a membership inference attack tailored specifically to unprotected methylation Beacons. Our experimental results show that 100 queries are sufficient to achieve a successful attack with AUC (area under the ROC curve) above 0.9. To remedy this situation, we propose a novel differential privacy mechanism, namely SVT^2, which is the core component of MBeacon. Extensive experiments over multiple datasets show that SVT^2 can successfully mitigate membership privacy risks without significantly harming utility. We further implement a fully functional prototype of MBeacon which we make available to the research community.
@inproceedings{mbeacon, title = {MBeacon: Privacy-Preserving Beacons for DNA Methylation Data}, author = {Hagestedt, Inken and Zhang, Yang and Humbert, Mathias and Berrang, Pascal and Tang, Haixu and Wang, XiaoFeng and Backes, Michael}, year = {2019}, booktitle = {Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS)}, }

CVCBT

Albatross – An optimistic consensus algorithm

Pascal Berrang, Philipp Styp-Rekowsky, Marvin Wissfeld, and 2 more authors

In Proceedings of the Crypto Valley Conference on Blockchain Technology (CVCBT), 2019

Bib

@inproceedings{berrang2019albatross,
  title = {Albatross -- An optimistic consensus algorithm},
  author = {Berrang, Pascal and von Styp-Rekowsky, Philipp and Wissfeld, Marvin and Fran{\c{c}}a, Bruno and Trinkler, Reto},
  booktitle = {Proceedings of the Crypto Valley Conference on Blockchain Technology (CVCBT)},
  year = {2019},
  organization = {IEEE},
  shorthand = {CryptoValley19},
}

PoPETs
Privacy-Preserving Similar Patient Queries for Combined Biomedical Data

Ahmed Salem, Pascal Berrang, Mathias Humbert, and 1 more author

Proceedings on Privacy Enhancing Technologies (PoPETs), 2019

Abs Bib PDF

The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, enabling a breakthrough towards more precise and personalized medicine. Naturally, the increasing availability of data also enables physicians to compare patients’ data and treatments easily and to find similar patients in order to propose the optimal therapy. Such similar patient queries (SPQs) are of utmost importance to medical practice and will be relied upon in future health information exchange systems. While privacy-preserving solutions have been previously studied, those are limited to genomic data, ignoring the different newly available types of biomedical data.

In this paper, we propose new cryptographic techniques for finding similar patients in a privacy-preserving manner with various types of biomedical data, including genomic, epigenomic and transcriptomic data as well as their combination. We design protocols for two of the most common similarity metrics in biomedicine: the Euclidean distance and Pearson correlation coefficient. Moreover, unlike previous approaches, we account for the fact that certain locations contribute differently to a given disease or phenotype by allowing to limit the query to the relevant locations and to assign them different weights. Our protocols are specifically designed to be highly efficient in terms of communication and bandwidth, requiring only one or two rounds of communication and thus enabling scalable parallel queries. We rigorously prove our protocols to be secure based on cryptographic games and instantiate our technique with three of the most important types of biomedical data – namely DNA, microRNA expression, and DNA methylation. Our experimental results show that our protocols can compute a similarity query over a typical number of positions against a database of 1,000 patients in a few seconds. Finally, we propose and formalize strategies to mitigate the threat of malicious users or hospitals.
@article{sahlem2019privacy, author = {Salem, Ahmed and Berrang, Pascal and Humbert, Mathias and Backes, Michael}, title = {Privacy-Preserving Similar Patient Queries for Combined Biomedical Data}, journal = {Proceedings on Privacy Enhancing Technologies (PoPETs)}, volume = {2019}, number = {1}, year = {2019}, }

2018

PiMLAI

Revisiting Membership Inference Attacks Against Machine Learning Models

Ahmed Salem, Yang Zhang, Mathias Humbert, and 3 more authors

In Privacy in Machine Learning and Artificial Intelligence (PiMLAI), 2018

Bib

@inproceedings{revisiting,
  title = {Revisiting Membership Inference Attacks Against Machine Learning Models},
  author = {Salem, Ahmed and Zhang, Yang and Humbert, Mathias and Berrang, Pascal and Fritz, Mario and Backes, Michael},
  year = {2018},
  booktitle = {Privacy in Machine Learning and Artificial Intelligence (PiMLAI)},
}

EuroS&P
Dissecting Privacy Risks in Biomedical Data

Pascal Berrang, Mathias Humbert, Yang Zhang, and 3 more authors

In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), 2018

Abs Bib PDF

The decreasing costs of molecular profiling has fueled the biomedical research community with a plethora of new types of biomedical data, enabling a breakthrough towards a more precise and personalized medicine. However, the release of these intrinsically highly sensitive data poses a new severe privacy threat. While biomedical data is largely associated with our health, there also exist various correlations between different types of biomedical data, along the temporal dimension, and also in-between family members. However, so far, the security community has focused on privacy risks stemming from genomic data, largely overlooking the manifold interdependencies between other biomedical data.

In this paper, we present a generic framework for quantifying the privacy risks in biomedical data taking into account the various interdependencies between data (i) of different types, (ii) from different individuals, and (iii) at different time. To this end, we rely on a Bayesian network model that allows us to take all aforementioned dependencies into account and run exact probabilistic inference attacks very efficiently. Furthermore, we introduce a generic algorithm for building the Bayesian network, which encompasses expert knowledge for known dependencies, such as genetic inheritance laws, and learns previously unknown dependencies from the data. Then, we conduct a thorough inference risk evaluation with a very rich dataset containing genomic and epigenomic data of mothers and children over multiple years. Besides effective probabilistic inference, we further demonstrate that our Bayesian network model can also serve as a building block for other attacks. We show that, with our framework, an adversary can efficiently identify the parent-child relationships based on methylation data with a success rate of 95%.
@inproceedings{backes2017dissecting, title = {Dissecting Privacy Risks in Biomedical Data}, author = {Berrang, Pascal and Humbert, Mathias and Zhang, Yang and Lehmann, Irina and Eils, Roland and Backes, Michael}, booktitle = {Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS\&P)}, publisher = {IEEE}, year = {2018}, }

2017

S&P
Identifying Personal DNA Methylation Profiles by Genotype Inference

Michael Backes, Pascal Berrang, Matthias Bieg, and 4 more authors

In Proceedings of the 38th IEEE Symposium on Security and Privacy (S&P), 2017

Abs Bib PDF Code

Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health.

In this paper, we show that, in contrast to the aforementioned belief, releasing one’s DNA methylation data causes privacy issues akin to releasing one’s actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone’s genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
@inproceedings{backes2017meQTLs, title = {Identifying Personal {DNA} Methylation Profiles by Genotype Inference}, author = {Backes, Michael and Berrang, Pascal and Bieg, Matthias and Eils, Roland and Herrmann, Carl and Humbert, Mathias and Lehmann, Irina}, booktitle = {Proceedings of the 38th IEEE Symposium on Security and Privacy (S\&P)}, year = {2017}, publisher = {IEEE}, }

2016

USENIX
Privacy in Epigenetics: Temporal Linkability of MicroRNA Expression Profiles

Michael Backes, Pascal Berrang, Anne Hecksteden, and 3 more authors

In Proceedings of the 25th USENIX Security Symposium (Security), 2016

Abs Bib PDF

The decreasing cost of molecular profiling tests, such as DNA sequencing, and the consequent increasing availability of biological data are revolutionizing medicine, but at the same time create novel privacy risks. The research community has already proposed a plethora of methods for protecting genomic data against these risks. However, the privacy risks stemming from epigenetics, which bridges the gap between the genome and our health characteristics, have been largely overlooked so far, even though epigenetic data such as microRNAs (miRNAs) are no less privacy sensitive. This lack of investigation is attributed to the common belief that the inherent temporal variability of miRNAs shields them from being tracked and linked over time.

In this paper, we show that, contrary to this belief, miRNA expression profiles can be successfully tracked over time, despite their variability. Specifically, we show that two blood-based miRNA expression profiles taken with a time difference of one week from the same person can be matched with a success rate of 90%. We furthermore observe that this success rate stays almost constant when the time difference is increased from one week to one year. In order to mitigate the linkability threat, we propose and thoroughly evaluate two countermeasures: (i) hiding a subset of disease-irrelevant miRNA expressions, and (ii) probabilistically sanitizing the miRNA expression profiles. Our experiments show that the second mechanism provides a better trade-off between privacy and disease-prediction accuracy.
@inproceedings{backes2016epigenetic_privacy, title = {Privacy in Epigenetics: Temporal Linkability of Micro{RNA} Expression Profiles}, author = {Backes, Michael and Berrang, Pascal and Hecksteden, Anne and Humbert, Mathias and Keller, Andreas and Meyer, Tim}, booktitle = {Proceedings of the 25th USENIX Security Symposium (Security)}, year = {2016}, publisher = {USENIX Association}, }
CCS
Membership Privacy in MicroRNA-based Studies

Michael Backes, Pascal Berrang, Mathias Humbert, and 1 more author

In Proceedings of the 23rd ACM Conference on Computer and Communication Security (CCS), 2016

Abs Bib PDF

The continuous decrease in cost of molecular profiling tests is revolutionizing medical research and practice, but it also raises new privacy concerns. One of the first attacks against privacy of biological data, proposed by Homer et al. in 2008, showed that, by knowing parts of the genome of a given individual and summary statistics of a genome-based study, it is possible to detect if this individual participated in the study. Since then, a lot of work has been carried out to further study the theoretical limits and to counter the genome-based membership inference attack. However, genomic data are by no means the only or the most influential biological data threatening personal privacy. For instance, whereas the genome informs us about the risk of developing some diseases in the future, epigenetic biomarkers, such as microRNAs, are directly and deterministically affected by our health condition including most common severe diseases.

In this paper, we show that the membership inference attack also threatens the privacy of individuals contributing their microRNA expressions to scientific studies. Our results on real and public microRNA expression data demonstrate that disease-specific datasets are especially prone to membership detection, offering a true-positive rate of up to 77% at a false-negative rate of less than 1%. We present two attacks: one relying on the L1 distance and the other based on the likelihood-ratio test. We show that the likelihood-ratio test provides the highest adversarial success and we derive a theoretical limit on this success. In order to mitigate the membership inference, we propose and evaluate both a differentially private mechanism and a hiding mechanism. We also consider two types of adversarial prior knowledge for the differentially private mechanism and show that, for relatively large datasets, this mechanism can protect the privacy of participants in miRNA-based studies against strong adversaries without degrading the data utility too much. Based on our findings and given the current number of miRNAs, we recommend to only release summary statistics of datasets containing at least a couple of hundred individuals.
@inproceedings{backes2016membership, title = {Membership Privacy in Micro{RNA}-based Studies}, author = {Backes, Michael and Berrang, Pascal and Humbert, Mathias and Manoharan, Praveen}, booktitle = {Proceedings of the 23rd ACM Conference on Computer and Communication Security (CCS)}, year = {2016}, publisher = {ACM}, }

UEOP

On Epigenomic Privacy: Tracking Personal MicroRNA Expression Profiles over Time

Michael Backes, Pascal Berrang, Anne Hecksteden, and 3 more authors

In Workshop on Understanding and Enhancing Online Privacy (UEOP), affiliated with NDSS, 2016

Bib

@inproceedings{backes2016epigenomic,
  title = {On Epigenomic Privacy: Tracking Personal Micro{RNA} Expression Profiles over Time},
  author = {Backes, Michael and Berrang, Pascal and Hecksteden, Anne and Humbert, Mathias and Keller, Andreas and Meyer, Tim},
  booktitle = {Workshop on Understanding and Enhancing Online Privacy (UEOP), affiliated with NDSS},
  year = {2016},
}

GenoPri
Simulating the Large-scale Erosion of Genomic Privacy Over Time

Michael Backes, Pascal Berrang, Mathias Humbert, and 2 more authors

In 3rd International Workshop on Genome Privacy and Security (GenoPri), Selected for publication in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016

Abs Bib PDF

The dramatically decreasing costs of DNA sequencing have triggered more than a million humans to date to have their genotypes sequenced. Moreover, these individuals increasingly make their genomic data publicly available, and thereby create unique privacy threats not only for themselves, but also for their relatives because of their DNA similarities. More generally, an entity that gains access to a significant fraction of sequenced genotypes from a given population might be able to infer even the genomes of unsequenced individuals by relying on available data.

In this paper, we propose a simulation-based model for quantifying the impact of continuously sequencing and publicizing personal genomic data on a population’s genomic privacy. Our simulation probabilistically models data sharing by individuals and additionally takes into account the influence on genomic privacy of geopolitical events such as migration, and sociological trends such as interracial marriage. We exemplarily instantiate our simulation with a sample population of 1,000 individuals, and evaluate the evolution of privacy under different settings over either thousands of genomic variants or a subset of variants influencing the phenotype. Our findings notably demonstrate that an increasing sharing rate of genomic data in the future entails a substantial negative effect on the privacy of all older generations. Moreover, we find that mixed populations, due to their large genomic diversity, face a less severe erosion of genomic privacy over time than more homogeneous populations. However, even when no data is shared, the genomic privacy averaged over a large number of variants is already very low since mere population allele frequencies already reveal a lot of information about the values of the genomic variants. By focusing on a subset of sensitive variants, we observe a higher genetic diversity in the population. Thus, genomic-data sharing can be much more detrimental for the privacy of the most sensitive variants.
@inproceedings{backes2016erosion, title = {Simulating the Large-scale Erosion of Genomic Privacy Over Time}, author = {Backes, Michael and Berrang, Pascal and Humbert, Mathias and Shen, Xiaoyu and Wolf, Verena}, booktitle = {3rd International Workshop on Genome Privacy and Security (GenoPri), Selected for publication in IEEE/ACM Transactions on Computational Biology and Bioinformatics}, year = {2016}, }

From Zoos to Safaris – From Closed-World Enforcement to Open-World Assessment of Privacy

Michael Backes, Pascal Berrang, and Praveen Manoharan

In Foundations of Security Analysis and Design VIII, 2016

Bib

@incollection{backes2016zoos,
  title = {From Zoos to Safaris -- From Closed-World Enforcement to Open-World Assessment of Privacy},
  author = {Backes, Michael and Berrang, Pascal and Manoharan, Praveen},
  booktitle = {Foundations of Security Analysis and Design VIII},
  year = {2016},
  publisher = {Springer-Verlag},
}

WPES

Profile Linkability despite Anonymity in Social Media Systems

Michael Backes, Pascal Berrang, Oana Goga, and 2 more authors

In Proceedings of the 15th ACM Workshop on Privacy in the Electronic Society (WPES), 2016

Bib

@inproceedings{backes2016profile,
  title = {Profile Linkability despite Anonymity in Social Media Systems},
  author = {Backes, Michael and Berrang, Pascal and Goga, Oana and Gummadi, Krishna and Manoharan, Praveen},
  year = {2016},
  booktitle = {Proceedings of the 15th ACM Workshop on Privacy in the Electronic Society (WPES)},
  publisher = {ACM},
}

2015

How well do you blend into the crowd?

Michael Backes, Pascal Berrang, and Praveen Manoharan

d-convergence: assessing identity disclosure risks in large-open scale web settings, 2015

Bib

@article{backes2015well,
  title = {How well do you blend into the crowd?},
  author = {Backes, Michael and Berrang, Pascal and Manoharan, Praveen},
  journal = {d-convergence: assessing identity disclosure risks in large-open scale web settings},
  year = {2015},
}

2012

A comparative analysis of decentralized power grid stabilization strategies

Arnd Hartmanns, Holger Hermanns, and Pascal Berrang

In Proceedings of the Winter Simulation Conference, 2012

Bib

@inproceedings{hartmanns2012comparative,
  title = {A comparative analysis of decentralized power grid stabilization strategies},
  author = {Hartmanns, Arnd and Hermanns, Holger and Berrang, Pascal},
  booktitle = {Proceedings of the Winter Simulation Conference},
  year = {2012},
  organization = {Winter Simulation Conference},
}

Dependability results for power grids with decentralized stabilization strategies

Pascal Berrang, Jonathan Bogdoll, Ernst Hahn Moritz, and 2 more authors

AVACS Technical Report, 2012

Bib

@article{berrang2012dependability,
  title = {Dependability results for power grids with decentralized stabilization strategies},
  author = {Berrang, Pascal and Bogdoll, Jonathan and Moritz, Ernst Hahn and Hartmanns, Arnd and Hermanns, Holger},
  journal = {AVACS Technical Report},
  number = {83},
  year = {2012},
}