zkProof

Notes on zkProof

zkProof

Starting from scratch

What is it?

TL;DR -> It is a cryptographic process where one party is able to prove to another party that a certain statement is true withou revealing any additional information apart from that the statement is true.

Zero-knowledge proofs are one of the most powerful tools devised by cryptographers. These were born in 1989 when MIT researchers were looking for way for two parties (one prover, one verifier) to convince one another that a mathematical statement is true.

The concern here was "What if the prover and the verifier do no trust each other?". -> How much will the verifier learn during the course of the message part from the fact that the statement is true.

In this scenario, zkProofs provide a method to solve NP-complete problems without comprising the privacy of the information involved.

NP-complete problems are problems that:
- Return "yes" or "no" as output for any input
- when the answer is "yes", the solution is short
- The correctness of the solution must be checked quickly and a brute force algo can find a solution to it

All zkProofs must satisfy:

Completenes - The PRover will eventually convince the verifier if they are honest
Soundsness - A bad Provider would never convince a Verifier of a false statement
Zero-knowledge(ness) - The interaction between Prover and Verifier only reveals if the statement is true or false. Nothing else.

So by using zkProofs we can prove facts in a variety of contexts both keeping privacy and precise correctness

zkProofs are usually interactive, meaning:

Once after the prover has generated and sent the solution for a computation
The verifier responds with a challenge value, randomly generated
That then the prover needs to compute again based on its initial commitment and the verifier challenge.

However, recent needs have led to developing a new heuristic technique - The Fiat-Shamir heuristic. The main idea behind it is:

The prover generates the computation and uses a hash function to generate the challenge. this hash function takes as input, the computation being proven and the commitment (a random value chose by the prover that serves as the first value for generating the non-interactive proof)
Instead of the verifier sending a random challenge value, it gets the proof and generates the hash based on the commitment.

It would look like:

1. Generates commitment a₁
2. Computes c₁ = Hash(statement || a₁)
3. Computes response z₁
4. Generates a₂
5. Computes c₂ = Hash(statement || a₁ || z₁ || a₂)
6. Computes z₂
...and so on

Final proof = (a₁, z₁, a₂, z₂, ...)

Verifier:
1. Gets proof (a₁, z₁, a₂, z₂, ...)
2. Computes c₁ = Hash(statement || a₁)
3. Verifies (a₁, c₁, z₁) is valid
4. Computes c₂ = Hash(statement || a₁ || z₁ || a₂)
5. Verifies (a₂, c₂, z₂) is valid
...and so on

zkProof in Solana

There are two types of statements we may want to prove:

Statements about facts -> Universal truths (1+1 = 2)
Statements about knowledge -> I know the factorization of N -> these kind are known as proofs of knowledge. as they go beyond universal truhts an relies on what the Prover knows.

As mentioned before the proofs can be interactive (prover and verifier are engaged in a series of rounds until the verifier is convinced beyond reasonable doubt). And non-interactive, the proof is delivered offline without communication between Prover and Verifier. The Prover generates a proof that encapsulate all the necessary information and the verifier can independently verify it without further interaction.

zCash or Filecoin are applications using this later technique.

Zcash allow users to perform anonymus tranasctions
Filecoin uses NIzkProofs (Non Interactive zkProof) to prove users have data stored without revealing the data itself.

zk-SNARKS and Circuits

zk-SNARK is a Succinct Non-interactive ARgument of Knowledge. -> Particularly efficient and compact (also know as succint)
Considered succint if the size of the rpoof and the time needed for verification grow slower that the computation to be verified
Based on polynomials and Fiat-Shamir heuristic
Regardless of complexity, the one thing attractive of zkSNARKS for rollups is size. -> Proof that all tx of a given L2 are valid and that proof is small enough to be verified on the L1.
Circuits are a sequence of mathematical operations that take some inputs to rpoduce outputs.
Circuits are used in zkProofs to represent correctly performed computations without reveling the inputs.

They work as follows:

Write and compile the circuit: Creating an arithmetic circuit defined by a prime and a computation. -> Represent computation to be proved as set of constraints
Trusted Setup Ceremony: May be necessary to run a ceremony to generate proving and verifying keys
Execute the circuit: The user would enter private and public values and the output would be calculated. The witness or trace, is the record of all the computation steps.
Generating the Proof: Give the proving key and the witness, the prover can generate the zkProof that all constraints are defined while only reveling the ouput.
Verifying the proof: The verified verifies the proof is correct for the public output using the submitted proof and its verifying key.

zk-STARKS

zk-STARK is a Scalable Transparent ARgument of Knowledge
Created by StarkWare, they allow blockchains to move computations to a single off-chain STARK prover and verify them with an on-chain STARK verifier.
Considered zk bc the uinputs used off-chain are not exposed to the blockchain, keeping privacy.
They are scalable bc moving proving off-chain reduces costs on the L1
they do not rely on trusted ceremonies but rather use verifiable randomness to set up the interaction between provers and verifiers. -> These can only be generated by the off-chain proved that executed the computation.
zk-STARKs completely avoid need for elliptic curves, relying purely onm hashes and information theory -> Becoming quantum-resistant
However, their size is bigger in comparison to zkSNARKS.

State Growth Problem in Solana

Solana State is stored in the disks of full nodes -> AccountDB. It is a key value stored where each entry is an account.
Each account has an address (32 bytes) and can store up to 10MB of data. (cost increases as more data is added), Storing 10MB costs 70SOL of rent, no matter if it is one account with 10MB or 1k accounts storing 10KB.
As of 2024 every day almost 1M new accounts are added to the chain.
The current snapshot size is 70GB (manageable). However, continuous growth would lead to state inefficiencies and potential bottlenecks -> As snapshot increases, the time for a cold start after hardware failure, increases. this could be detrimental on network restart.
PCI bandwidth refers to the data trasfer between CPU and peripheral devices (graphic cards, network, etc). high-speed (PCIe - PCI express) PCIs can provide transfer rates up to 128GB/s, which would limit solana to 1000 TPS if a tx reads or writes 128MB.
Each validator needs to maintain an index of all existing accounts to check that the account does not alrady exists. -> A current state of 500M accounts on a minimal index(32bytes), the validator requires loading 32GB of RAM, which is expensive
With state growth then, the distinction between expensive fast memory (RAM) and slower cheaper memory (disk) is crucial.
For transactions, it also becomes interesting as tx need to define all the accoutns they want to access in advance. The tx layout would be as follows:
- 3 bytes for the header
- 64 bytes per signature (per account)
- 32 bytes per address
- arbitrary size until 1232 bytes reached for ix data
- 32 bytes for the recent blockhash
Once the tx is received by a validator it:
Checks the tx is valid and avoids deduplication, verify the fees and signatures and the structure
The program bytecode is loaded and the SVM instantiated
All accounts are checked and loaded into memoery then passed to the SVM
Program is executed
Updated state is synced back into storage

How can zkProof help solana ?

Instead of storing all accounts on disk and reading them when needed, the tx could pass the account data as part of the tx payload.
Proving the correct state using Merkle tree -> Merkle proofs are a way to commit this data that later can be verified against the commitment (stored on-chain) to validate the correct state has been passed in.
These proofs can be quite large. A tree containing 100k accounts would need 544 bytes. Providing such proofs for multiple accounts would exceed the limit of 1232 bytes easily. -> Can be solved using constant size commitments
zk Compression is a mechanism for addressing Merkle proof size as it is a way of proving that some computation has been done correctly withoout the costs of on-chain storage.

zkCompression

It is a new primitive that allows developers to compress on-hcina state -> Reducing state costs while preserving security.
Leverage zero-knowledge proofs to validate state transitions without exposing underlying data.
It groups multiple accounts into a single verifiable Merkle root stored on chain while the underlying data is stored on the ledger.
These validity proofs are succint zkproofs used to prove the existence of N number of accounts as the leaves of the tree with M number of state trees while maintating a constant 128-byte proof size
These account in the Merkle root however are not regular Solana Accounts, they are compressed accounts.

Screenshot 2024-12-26 at 13.37.48.jpg

Compressed Account Model

These accounts are similar to regular Solana Accounts but storing zk compressed state. For this matter, they have several key differences that enhance their efficiency:

Hash Identification - for identifying the compressed account
Hash changes on write - Any write to a compressed account, changes its hash
Optional Address - A permanent address can be set optionally as the id of a compressed account. It can be useful for certain use cases. However, it is optimal as the goal is to avoid overhead as accounts can be referenced by their hash
Sparse State Trees - All compressed accounts are stored in Merkle trees. Only the root of the tree is stored on-chain

Compressed PDA can be indentified by their unique persistent address. Following a simular layout to a regular PDA account, their data field also enshrines the AccountData structure.

Nodes on zk Compression

There are different types of nodes in solana zk compression:

Photon RPC nodes: they index the compression programs enabling clients to read and build transactions that interacts with the compressed state. The canonical compression indexer is named Photon and it is provided by Helius. Can be run with minimal setup and requires to be pointed to an existing RPC node
Prover nodes: they are used for generating the validity proofs for state inclusion. Proofs can be fetched with getValidityProof from the zk Compression API specification. These can be operated alone or bundle with another RPC. The Photon RPC Node implementation comes along a Prover node
Light Forester nodes: these nodes manage the creation, rollover and update of shared and program-owned state trees. Meant for developers that want to choose their own program-owned state trees.
- They work in conjunction to Photon nodes and help distribute the proof of generation load
- They can also be used for quick reads reducing load on main Photon nodes
- They provide faster response

Workflow

Photon nodes delegate proof tasks to Light Foresters
Light Foresters generate proofs in parallel
Results are aggregated back at the Photon nodes

graph TB
    subgraph User Actions
        TX[Transaction] --> |Submit| RPC[RPC Endpoint]
    end

    subgraph Compression Infrastructure
        RPC --> |Forward| PN[Photon Node]
        PN --> |Process| ZKP[Generate ZK Proof]
        PN --> |Maintain| MT[Merkle Tree State]
        PN --> |Index| DB[(Compressed Data Store)]

        subgraph Light Forester Network
            LF1[Light Forester Node 1]
            LF2[Light Forester Node 2]
            LF3[Light Forester Node 3]
            PN --> |Delegate Proofs| LF1 & LF2 & LF3
            LF1 & LF2 & LF3 --> |Return Proofs| PN
        end
    end

    subgraph Solana Network
        ZKP --> |Submit| Val[Validator]
        MT --> |Verify| Val
        Val --> |Update| State[On-chain State]
    end

    subgraph Query Services
        DB <--> |Query| API[Query API]
        API <--> |Serve| Apps[dApps]
        LF1 & LF2 & LF3 <--> |Fast Queries| API
    end

    style User Actions fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Compression Infrastructure fill:#e6f3ff,stroke:#333,stroke-width:2px
    style Light Forester Network fill:#e6ffe6,stroke:#333,stroke-width:2px
    style Solana Network fill:#f0f0f0,stroke:#333,stroke-width:2px
    style Query Services fill:#fff0f0,stroke:#333,stroke-width:2px

Disadvantages and limitations

Trust assumptions:
- Anyone can run one of the afore mentioned nodes and store raw data needed for generating proofs and submitting tx. This introduces a trust assumption that impacts the liveliness of the crompressed state.
  - If the data is lost or delayed, transactions cannot be submitted unless the data is stored personally
  - Since only on honest node is needed and the proofs are self-verifiable, the issue lies in the liveliness and potential censorship of the data.
- The fact that the program that verifies compressed accounts is upgradable, while opening the door for potential fixes or new requirements, also allows for it to be set inmutable or frozen
- The use of Forester nodes. These nodes maintain the state root advancement and manage queues emptying null queues and advancing state roots. Account hashes are replaced with zeroes to nullify them. This ensures instant finality of compressed state transitions while keeping tx size within solana contraints.
  - Emptying the queues is important for the zk compression protocol as a full queue would cause a liveliness failure for the associated state tree. However, people still need to run this nodes to ensure protocol’s integrity and liveliness
Limitations: Even if not necessary to hide something, zk proofs help turn problems that requiree multiple steps into one that requires verifying a single proof to know the computations were run correctly.
- Larger Tx Size: Zk compression requires 128 bytes for the validity proof and the data to be read/written on-chain sent
- Higher CU Usage: ZK Compression increases CUY usages and it requires +-100k CU for proof verifications, +-100k for system use and +-6k CU per compressed account read or write.
- Per-tx state cost: Each write operation incurs a small netwrok corst as it must nullify the previous compressed account state and append the new compressed state to the state tree. Thus, a compressed accounts’s lifetime cost could potentially surpass its uncompressed equivalent as it requires multiple updates.
- Use regular accounts if:
  - The account is updated frequently
  - The lifetime number of writes will be large
  - The account stores large amounts of data that must be accessed in on-chain transactions.

Benefits

zk compression allows apps to scale to millions of users storing state on a cheaper ledger space while keeping on-chain storages at minimum with state fingerprinting. ex: minting 10k token accounts would cost 5200 USD (sol at 260 USD) while for zk is only a dollar
The structure is almost identical to a solana regular account
It supports paralellism → Two transactions under the same state tree (commitment) that can accessed different compressed accounts, can be executed in parallel
Leverages atomic composability → a tx that lists n compressed accounts and m regular accounts is completely valid. An ix referring a compressed account can call another ix or program referencing a regular account → True even if accounts are compressed under different state trees. If one ix fails, then the entire tx is rolled back and changes are visible from one ix to the next.
It is not a zk rollup bc rollups call each other sync or atomically unless they have locks

Zk compression is not a rollup

There are two kind of rollups:

Optimistic - All tx are assumed valid for a given period of time and fraud proofs are used to prove false tx within this timeframe
zk Rollups - Tx are proved instantly valid or invalid using validity proofs.

The entire state of a zk Rollup is represented as a single root on the base layer(i.e ETH). This is not the case for zk Compression.

On a zk rollup, a scenario involving 500 tx would store a single proof confirming the state changed from A to B after the txs are executed.

In zk compression, each tx generates its own proof to verify the correctness of the account data. the tx are executed by the SVM and the accounts are treated as regular accounts once each proof is validated.

Key Differences Illustrated:

ZK Rollup (Top):
- Batches all 500 transactions together
- Processes them in a single circuit
- Generates one proof for entire batch
- Updates state root in single operation
- Requires L1 contract interaction
Solana ZK Compression (Bottom):
- Each transaction processed independently
- Individual proof per transaction
- Parallel proof generation and validation
- Direct SVM integration
- No batching or rollup required
- Treats compressed accounts like regular ones

Screenshot 2024-12-26 at 14.57.30.jpg

zk Compression Execution

Pre-Execution:
- Transaction is submitted to Photon node
- Proof is generated by Light Forester nodes
- Proof validates the compressed state transition
During Execution:
- Validator receives transaction + proof
- Validator FIRST verifies the proof
- Only after proof verification does the transaction execute
- State is updated based on transaction execution
Important Distinctions:
- Proof validates state transition correctness
- Proof verification happens before execution
- Transaction execution happens on-chain
- State update is the final step

This differs from rollups because:

No batch waiting
Proofs are validated before execution
Each transaction maintains independence
State updates happen immediately after individual execution

Screenshot 2024-12-26 at 15.00.01.jpg

State update

After transaction execution, two types of states need to be maintained:

On-chain State:
- Merkle root is stored on-chain
- Modified account data is stored on-chain
- Account compression program maintains these
- No additional manual updates needed
Off-chain Merkle Tree State:
- Maintained by Photon nodes
- Stores the full compressed data structure
- Automatically synchronized across network
- No manual updates required

Key Points:

The on-chain updates happen automatically as part of transaction execution
Photon nodes handle off-chain merkle tree updates
As a user/developer, you don't need to make additional updates
The system maintains consistency between both states

The beauty of this system is that once your transaction is executed:

The on-chain state is automatically updated
Photon nodes automatically update the merkle tree
No additional action is required from the user/developer

Screenshot 2024-12-26 at 14.59.44.jpg

Poseidon Syscalls

Poseidon is a family of hash functions designed for zk proofs. It is more computationally efficient than traditional hashing functions like SHA-256

Interoperability

Solana as zk chain is a highly performant layer 1 blockcahin with low fees and runtime support for elliptic curve operations. The introduction of alt_bn128 syscalls opened the door for composability between Solidity and Solana based contracts on precompilled contracts for elliptic curve operations. These operations facilitate zkSNARK proof verification within ETH gas limits making it easier for Solana to transition

SIMD-0075 proposes enabling projects to use off-chain proof generation and on-chain verification for storing Merkle commitments. As well the new syscalls introduced there allows for brdiging an interoperability making solana a possible L2 for ethereum if zk compression interoperability is achieved. Celestia is one example of this.