Zero Knowledge Proofs in Machine Learning: A Comprehensive Guide

Zero Knowledge Proofs in Machine Learning: A Comprehensive Guide

The convergence of zero-knowledge proofs (ZKPs) and machine learning (ML) is opening up new possibilities for creating privacy-preserving and secure AI systems. Zero-Knowledge Proof Machine Learning (ZKML) allows for the verification of machine learning processes without exposing sensitive data or model parameters. This blog will explore the key concepts, applications, and challenges of ZKML, drawing […]

Insights 22/10/2024
Zero Knowledge Proofs in Machine Learning: A Comprehensive Guide

The convergence of zero-knowledge proofs (ZKPs) and machine learning (ML) is opening up new possibilities for creating privacy-preserving and secure AI systems. Zero-Knowledge Proof Machine Learning (ZKML) allows for the verification of machine learning processes without exposing sensitive data or model parameters. This blog will explore the key concepts, applications, and challenges of ZKML, drawing insights from recent developments and practical use cases.

Understanding Zero-Knowledge Proofs

Zero-Knowledge Proofs are cryptographic methods that enable one party (the prover) to demonstrate the truth of a statement to another party (the verifier) without revealing any additional information. This technology is crucial for scenarios where privacy and data security are of utmost importance, such as in financial transactions, healthcare, and identity verification.

Developing zero-knowledge proofs requires a significant amount of processing power—many times more than the initial computation. This indicates that there exist certain computations for which computing zero-knowledge proofs is not viable due to the time required to generate them on the most advanced gear currently available. However, zero-knowledge proofs are now possible for ever-intense calculations because of recent developments in distributed systems, hardware, and cryptography. These developments have opened up the design space for new applications by enabling the development of protocols that can leverage proofs of intensive computations.

Privacy-Preserving Model Training

In federated learning, multiple parties collaborate to train a global model without sharing their private data. Zero-knowledge proofs can be used to verify the correctness of each participant’s contribution to the training process. This ensures that the global model is trained accurately without any data leakage, enhancing trust among participants. For example, zk-SNARKs (Succinct Non-Interactive Arguments of Knowledge) can be used to verify that each participant’s model updates are computed correctly, thereby ensuring the integrity of the overall training process.

Verifiable Machine Learning Inference

ZKPs can also be applied to verify the correctness of machine learning inferences. This is particularly useful in sensitive applications such as healthcare or finance, where it is essential to ensure the reliability of model predictions without exposing the underlying data or model parameters. Recent research has demonstrated the use of zk-SNARKs to verify neural network inferences on encrypted data, enabling secure and private ML predictions.

Motivation and current efforts in ZKML 

As AI-generated content becomes more similar to human-created work, zero-knowledge cryptography offers a promising way to verify that certain outputs were produced by specific models, like GPT-4 or DALL-E 2, without revealing the input or the model. This could be especially useful in fields like healthcare, where sensitive data can be processed by machine learning models while keeping the input private from third parties. Zero-knowledge machine learning (ZKML) focuses on creating proofs for the inference phase of a model (when it makes predictions) rather than the more complex training phase. Although current zero-knowledge systems and hardware aren’t yet capable of handling large models like modern language models, progress has been made in creating proofs for smaller models. The “awesome-zkml” GitHub repository collects resources on ZKML. For example, Modulus Labs’ paper “The Cost of Intelligence” benchmarks different zero-knowledge proof systems across a variety of model sizes. One notable finding is that with the plonky2 proving system, it’s possible to create proofs for models with around 18 million parameters in about 50 seconds using a powerful AWS machine.  Figure 1 illustrates the scaling behavior of different proving systems as the number of parameters of a neural network is increased:

figure1

 The paper also shows how different proof systems scale as model sizes increase. Another initiative, Zkonduit’s ezkl library, allows engineers to create zero-knowledge proofs for machines learning models exported in ONNX format, making it easier for ML developers to prove model outputs. Several teams are also working on improving zero-knowledge technology and creating specialized hardware to speed up computations, especially for tasks like proving and verifying. As these technologies mature, we’ll likely see larger models proved on less powerful machines and in less time, thanks to advancements in hardware, proof systems, and more efficient protocols. This progress will open the door to new applications and use cases for ZKML.

Use case examples

Figure2

To determine whether ZKML (zero-knowledge Machine Learning) can be applied to a specific use case, we can examine how the properties of zero-knowledge (ZK) would support various applications. This can be illustrated through a Venn diagram.

COMPUTATIONAL INTEGRITY (VALIDITY ML) 

Proofs of validity (SNARK/STARK) can be used to demonstrate that certain computations have been performed correctly. In the context of ML, this involves proving that the inference of an ML model, or that a model has generated a specific output using a specific input, is valid (e.g., verifying the results of a task). It allows easy proof and verification that an output is the result of a model with a given input. This enables ML models to run off-chain on specialized hardware with their ZK proofs, which can easily be verified on-chain.

Example: Giza is helping Yearn (a DeFi yield aggregation protocol) prove that certain complex ML-based yield strategies are being executed correctly on-chain.

ML TRANSPARENCY AS A SERVICE (MLAAS)

 When companies provide access to ML models via their APIs, it’s difficult for users to verify whether the service provider is actually running the model they claim to be running, as the API functions as a black box. Attaching validity proofs to the APIs of ML models would bring transparency to users, allowing them to verify which model they are using.

ZK ANOMALY/FRAUD DETECTION 

ZK proofs could be generated to detect exploitation or fraud. Anomaly detection models can be trained on smart contract data and agreed upon by DAOs as valuable metrics, enabling security processes such as pausing contracts in a preventive, proactive manner. Some startups are exploring the use of ML models for security in the context of smart contracts, and ZK proofs of anomaly detection could be the next step.

PRIVACY (ZKML)

 Beyond validity proofs, ZK can also hide parts of the computation to enable privacy-preserving ML applications. Here are a few examples:

  • Decentralized Kaggle: Proof that a model has higher than X% accuracy on some test data without revealing the model’s weights.
  • Privacy-preserving inference: This involves using patient data for medical diagnostics in a privacy-preserving manner. Instead of sharing entire patient datasets with machine learning models, patients can input their data into the model, but only the diagnostic result is shared with the patient. For example, if a patient uses a model to check for cancer, the test result would be sent directly to the patient, keeping their information private.

These use cases illustrate the potential of ZKML in enabling transparency, security, and privacy for machine learning applications.

Since technology advances, the integration of Zero Knowledge Machine Learning (ZKML) is expected to become more seamless. Ongoing research is tackling current challenges, while collaborations between cryptography and machine learning experts are speeding up the creation of practical ZKML frameworks. In a world where privacy is becoming increasingly scarce, ZKML offers a promising solution. By enabling organizations to leverage the power of machine learning while safeguarding individual privacy, this approach represents a crucial step toward a more ethical and secure AI landscape. As AI and machine learning become part of our everyday lives, it is more important than ever to trust the models and data that influence the safety and security of our world.

Start creating your own ZKP project with our expert team, SotaZK now!