Research Assistant and PhD Student
Information Security and Cryptography Group, Saarland University
Graduate Studies in Computer Science
|2013||Student Trainee at SAP|
B.Sc. in Computer Science
|2015-2016||Member of the Organizing Committee of the 1st IEEE European Symposium on Security & Privacy|
|2015-present||Member of the hiring committees for several professorships|
|2014-present||Member of the Examination Board for the Cybersecurity Bachelor Studies|
|2014-present||Member of our University's CTF Team saarsec|
|2012-2014||Elected Member of the Students' Representatives Council for Computer Science|
|Summer Term 2016||Advisor for Theory of Modern Privacy Research (Seminar)|
|Winter Term 2015/16||
Teaching Assistant for Grundlagen der Cybersicherheit
(Foundations of Cybersecurity)
|Winter Term 2012/13||
Student Teaching Assistant for Mathematik für Informatiker III
(Mathematics for Computer Scientists 3)
|Summer Term 2012||Student Teaching Assistant for Mathematics Preparatory Course||Winter Term 2011/12||
Student Teaching Assistant for Programmierung 1
|Winter Term 2011/12||Student Teaching Assistant for Mathematics Preparatory Course|
Since the first whole-genome sequencing, the biomedical research community has made significant steps towards a more precise, predictive and personalized medicine. Genomic data is nowadays widely considered privacy-sensitive and consequently protected by strict regulations and released only after careful consideration. Various additional types of biomedical data, however, are not shielded by any dedicated legal means and consequently disseminated much less thoughtfully. This in particular holds true for DNA methylation data as one of the most important and well-understood epigenetic element influencing human health.
In this paper, we show that, in contrast to the aforementioned belief, releasing one's DNA methylation data causes privacy issues akin to releasing one's actual genome. We show that already a small subset of methylation regions influenced by genomic variants are sufficient to infer parts of someone's genome, and to further map this DNA methylation profile to the corresponding genome. Notably, we show that such re-identification is possible with 97.5% accuracy, relying on a dataset of more than 2500 genomes, and that we can reject all wrongly matched genomes using an appropriate statistical test. We provide means for countering this threat by proposing a novel cryptographic scheme for privately classifying tumors that enables a privacy-respecting medical diagnosis in a common clinical setting. The scheme relies on a combination of random forests and homomorphic encryption, and it is proven secure in the honest-but-curious model. We evaluate this scheme on real DNA methylation data, and show that we can keep the computational overhead to acceptable values for our application scenario.
The dramatically decreasing costs of DNA sequencing have triggered more than a million humans to date to have their genotypes sequenced. Moreover, these individuals increasingly make their genomic data publicly available, and thereby create unique privacy threats not only for themselves, but also for their relatives because of their DNA similarities. More generally, an entity that gains access to a significant fraction of sequenced genotypes from a given population might be able to infer even the genomes of unsequenced individuals by relying on available data.
In this paper, we propose a simulation-based model for quantifying the impact of continuously sequencing and publicizing personal genomic data on a population’s genomic privacy. Our simulation probabilistically models data sharing by individuals and additionally takes into account the influence on genomic privacy of geopolitical events such as migration, and sociological trends such as interracial marriage. We exemplarily instantiate our simulation with a sample population of 1,000 individuals, and evaluate the evolution of privacy under different settings over either thousands of genomic variants or a subset of variants influencing the phenotype. Our findings notably demonstrate that an increasing sharing rate of genomic data in the future entails a substantial negative effect on the privacy of all older generations. Moreover, we find that mixed populations, due to their large genomic diversity, face a less severe erosion of genomic privacy over time than more homogeneous populations. However, even when no data is shared, the genomic privacy averaged over a large number of variants is already very low since mere population allele frequencies already reveal a lot of information about the values of the genomic variants. By focusing on a subset of sensitive variants, we observe a higher genetic diversity in the population. Thus, genomic-data sharing can be much more detrimental for the privacy of the most sensitive variants.
The continuous decrease in cost of molecular profiling tests is revolutionizing medical research and practice, but it also raises new privacy concerns. One of the first attacks against privacy of biological data, proposed by Homer et al. in 2008, showed that, by knowing parts of the genome of a given individual and summary statistics of a genome-based study, it is possible to detect if this individual participated in the study. Since then, a lot of work has been carried out to further study the theoretical limits and to counter the genome-based membership inference attack. However, genomic data are by no means the only or the most influential biological data threatening personal privacy. For instance, whereas the genome informs us about the risk of developing some diseases in the future, epigenetic biomarkers, such as microRNAs, are directly and deterministically affected by our health condition including most common severe diseases.
In this paper, we show that the membership inference attack also threatens the privacy of individuals contributing their microRNA expressions to scientific studies. Our results on real and public microRNA expression data demonstrate that disease-specific datasets are especially prone to membership detection, offering a true-positive rate of up to 77% at a false-negative rate of less than 1%. We present two attacks: one relying on the L_1 distance and the other based on the likelihood-ratio test. We show that the likelihood-ratio test provides the highest adversarial success and we derive a theoretical limit on this success. In order to mitigate the membership inference, we propose and evaluate both a differentially private mechanism and a hiding mechanism. We also consider two types of adversarial prior knowledge for the differentially private mechanism and show that, for relatively large datasets, this mechanism can protect the privacy of participants in miRNA-based studies against strong adversaries without degrading the data utility too much. Based on our findings and given the current number of miRNAs, we recommend to only release summary statistics of datasets containing at least a couple of hundred individuals.
The decreasing cost of molecular profiling tests, such as DNA sequencing, and the consequent increasing availability of biological data are revolutionizing medicine, but at the same time create novel privacy risks. The research community has already proposed a plethora of methods for protecting genomic data against these risks. However, the privacy risks stemming from epigenetics, which bridges the gap between the genome and our health characteristics, have been largely overlooked so far, even though epigenetic data such as microRNAs (miRNAs) are no less privacy sensitive. This lack of investigation is attributed to the common belief that the inherent temporal variability of miRNAs shields them from being tracked and linked over time.
In this paper, we show that, contrary to this belief, miRNA expression profiles can be successfully tracked over time, despite their variability. Specifically, we show that two blood-based miRNA expression profiles taken with a time difference of one week from the same person can be matched with a success rate of 90%. We furthermore observe that this success rate stays almost constant when the time difference is increased from one week to one year. In order to mitigate the linkability threat, we propose and thoroughly evaluate two countermeasures: (i) hiding a subset of disease-irrelevant miRNA expressions, and (ii) probabilistically sanitizing the miRNA expression profiles. Our experiments show that the second mechanism provides a better trade-off between privacy and disease-prediction accuracy.
Following the genomic revolution and the consequent deluge of DNA data, a lot of research has been carried out to better understand and protect genomic privacy. However, genomics is only the tip of the iceberg of a broader epigenomic breakthrough currently going on. In order to shed light on privacy issues stemming from epigenomic data, we study how personal microRNA expression profiles can be tracked over time. By relying on principal component analysis and graph matching, we show that, despite the variability of gene expression, it is possible to track one or multiple expression profile(s) at different points in time. Specifically, we show that blood miRNA profiles of healthy athletes collected at a one-week interval can be matched together with a success rate of 90%. We also find out that blood expression profiles are much easier to link over time than plasma profiles that yield a success rate around twice smaller. Our results for plasma microRNA expression profiles are confirmed by another dataset of patients with lung cancer collected over a time period of more than 18 months. This second dataset also shows that a greater time shift between two miRNA expression’s databases slightly decreases the attack’s success.
In this paper, we develop a user-centric privacy framework for quantitatively assessing the exposure of personal information in open settings. Our formalization addresses key-challenges posed by such open settings, such as the necessity of user- and context-dependent privacy requirements. As a sanity check, we show that hard non-disclosure guarantees are impossible to achieve in open settings.
In the second part, we provide an instantiation of our framework to address the identity disclosure problem, leading to the novel notion of d-convergence to assess the linkability of identities across online communities. Since user-generated text content plays a major role in linking identities between Online Social Networks, we further extend this linkability model to assess the effectiveness of countermeasures against linking authors of text content by their writing style.
We experimentally evaluate both of these instantiations by applying them to suitable data sets: we provide a large-scale evaluation of the linkability model on a collection of 15 million comments collected from the Online Social Network Reddit, and evaluate the effectiveness of four semantics-retaining countermeasures and their combinations on the Extended-Brennan-Greenstadt Adversarial Corpus. Through these evaluations we validate the notion of d-convergence for assessing the linkability of entities in our Reddit data set and explore the practical impact of countermeasures on the importance of standard writing style features on identifying authors.
This paper reports on formal behavioral models of power grids with a substantial share of photovoltaic micro- generation. Simulation studies show that the current legislatory framework in Germany can induce frequency oscillations. This phenomenon is indeed recognized by the German Federal Network Agency responsible for overseeing the national power grids, and new regulations are currently being identified to counter this phenomenon. We study the currently valid proposal, and compare it with a set of alternative approaches that take up and combine ideas from communication protocol design, such as additive-increase/multiplicative- decrease known from TCP, and exponential backoff used in CSMA variations. We classify these alternatives with respect to their availability and goodput. The models are specified in the modeling language MODEST, and simulated with the help of the modes simulator.