Privacy-preserving AI Safety Verification

Problem. As AI systems are deployed in high-stakes settings, we need ways to verify they meet safety guarantees. But safety verification typically requires access to model internals and training data — creating tensions with intellectual property and data privacy.

Approach. We develop foundations of privacy-preserving AI safety verification using zero-knowledge proofs, enabling verification of AI safety guarantees without revealing sensitive model or data details. This combines formal verification of neural networks with cryptographic proof systems.

Impact. Our work enables a new paradigm where AI developers can provide mathematical assurances of safety to regulators and the public without exposing proprietary models or sensitive data.

Funded by ARIA under the Mathematics for Safe AI programme. Joint work with Mirco Giacobbe and Yang Zhang (CISPA).