|
Barretenberg
The ZK-SNARK library at the core of Aztec
|
Aztec's goal is to enable private verifiable execution of smart contracts. This motivates a proving system design where:
Efficient recursion supports low memory proving - statements can be decomposed via recursion into smaller statements that require less prover memory.
CHONK builds on PlonK [2], sharing its foundation:
Its deviations from PlonK are motivated by the goals above:
For a video presentation, see [10].
Chonk implements Repeated Computation with Global state (RCG) as defined in the Stackproofs paper [3]. It combines HyperNova folding with Goblin to produce a single succinct proof.
Unlike Incrementally Verifiable Computation (IVC), RCG has the following properties:
This makes RCG well-suited for private smart contract execution where global constraints (like nullifier uniqueness or public state consistency) must be verified across all circuits.
Circuits are accumulated in alternating pairs:
A Chonk proof (ChonkProof_<IsRecursive>) is a unified template structure that works for both native and recursive verification modes:
Proof components:
Verification Architecture:
The Chonk verifier performs verification in stages:
MegaZKVerifier::reduce_to_pairing_check to verify Hiding kernelaggregate_multiple:ChonkVerifier::ReductionResult with aggregated pairing points and IPA claim for deferred verificationNote on deferred verification: IPA claims and pairing points are propagated through the rollup:
RollupIO public inputs through tx_merge → block_merge → checkpoint_root → checkpoint_merge. At each level, claims from child proofs are accumulated via IPA::accumulate. Finally verified in-circuit in the root rollup via IPA::full_verify_recursive.This amortizes the cost of IPA verification across many proofs.
This section explains the motivation behind Chonk's design. Skip if you're familiar with the problem.
Assumption: All circuits have fixed size $N = 2^{21}$ from the verifier's perspective.
In a standard recursive verification setup, each circuit must verify the previous circuit's proof. A Honk verifier performs:
The commitment batching can be slightly optimized to use short scalars (Fiat-Shamir challenges are ~127 bits) but recursive verifier circuit size exceeds 512K gates - too large for practical use.
Goblin improves the naive approach by delegating non-native EC operations to a separate ECCVM circuit. Instead of performing scalar muls directly in-circuit (expensive), we:
Cost reduction: EC operations become "free" in the main circuit - just op queue entries. However, each recursive verification still adds significant work to the op queue:
NUM_UNSHIFTED_ENTITIES + NUM_SHIFTED_ENTITIES for MegaFlavorIn ECCVM terms, full scalar muls cost ~2× the rows of short scalar muls. So each recursive verification effectively adds **~100+ ECCVM ops**.
Note: The Merge protocol's ECCVM cost is present in both the naive Goblin approach and in Chonk - it's inherent to maintaining the op queue across circuits.
For a chain of $k$ circuits, ECCVM must handle $O(k \cdot ($ NUM_UNSHIFTED_ENTITIES + NUM_SHIFTED_ENTITIES $+ 2\log N))$ short scalar operations.
The key insight of HyperNova folding is: we don't need to complete PCS verification at each step. Instead:
After Sumcheck, the verifier has NUM_ALL_ENTITIES evaluation claims (commitment + claimed value at point $r$). These are batched using random challenges $\rho_i, \sigma_j$ into just two claims:
Each circuit's Sumcheck produces claims at a different random point $r_i$. We need to reduce individual openings at given points to the opening of a random linear combination at a new random challenge.
The solution is MultilinearBatchingSumcheck: a small Sumcheck (only 6 witness columns) that reduces two claims at different points $(r_{\text{acc}}, r_{\text{inst}})$ to a single claim at a common point $r_{\text{new}}$. See HyperNova Folding Details for the full protocol.
In MultilinearBatchingVerifier::compute_new_claim:
NUM_UNSHIFTED_ENTITIES + 1 short scalar muls (55 + 1 for MegaFlavor) - batching instance commitments + accumulatorNUM_SHIFTED_ENTITIES + 1 short scalar muls (5 + 1 for MegaFlavor) - same for shiftedTotal: 62 short scalar multiplications per circuit, versus 55 short + 21 full in naive/Goblin approaches. Crucially:
| Approach | EC Ops per Circuit | Final MSM Size | Notes |
|---|---|---|---|
| Naive Recursive | 55 + 21 full scalar muls | 78 | Circuit > 512K gates |
| Goblin (no folding) | 60 short + 21 full (op queue) | >102 short | ~100 ECCVM rows/circuit |
| Chonk (HyperNova + Goblin) | 62 short scalar muls (op queue) | N/A (deferred) | Shplemini deferred to decider |
Combining UltraHonk features such as custom gates with databus mechanism enabling inter-circuit communication with Hypernova sumcheck-based folding boosted by Goblin elliptic curve operation deferral protocol we get a client-friendly RCG that can be run on a mobile phone.
A key benefit of Chonk's folding approach is that prover memory is bounded by the largest individual circuit, not the total computation size.
Memory bound: Peak memory occurs during the first Sumcheck round and is bounded by $1.5 \times \max_i |\text{ProverPolynomials}_i|$, where the max is over all input circuits. Crucially, this bound is independent of the number of circuits being folded.
Example: 55 dense polynomials of size $2^{18}$ consume $1.5 \times 55 \times 2^{18} \times 32\text{ bytes} \approx 660\text{ MB}$ (shifted polynomials share memory with unshifted), which serves as a rough upper bound for a Mega circuit's RAM footprint during Chonk.
Chonk uses HyperNova [5] for folding circuits into accumulators. Each circuit goes through Sumcheck to produce evaluation claims, which are batched into an accumulator. The MultilinearBatchingSumcheck protocol combines accumulators at different evaluation points into a single accumulator at a common point.
See HyperNova Folding Details for the full protocol specification.
Goblin handles non-native EC operations by deferring them to an op queue, then proving correct execution via:
Each circuit produces a subtable of ECC operations that must be merged into the global op queue. The Merge protocol proves this was done correctly. See MERGE_PROTOCOL.md for full details including the degree check and ZK analysis.
What it proves: For each of 4 wire columns $j$:
$$M_j(X) = L_j(X) + X^k \cdot R_j(X)$$
where $M_j$ = merged table, $L_j$ = new subtable, $R_j$ = existing table, $k$ = shift.
Commitments to $L_j$ and $R_j$ are reused from the HyperNova transcript, avoiding redundant work.
The databus enables data passing between circuits through commitment equality checks.
Traditional public inputs (PI) require the verifier to hash the complete PI data to generate challenges. This hashing must happen in-circuit for recursive verification, and in-circuit hashing is expensive. For large data transfers between circuits (e.g., state reads/writes, function call stacks), this quickly becomes prohibitive.
The databus solves this by using commitments instead of raw data. Rather than passing raw public data to the verifier, we commit to the data and perform consistency checks directly on those commitments. The cost to verify a commitment equality is $O(1)$, independent of the data size.
| Column | Purpose |
|---|---|
calldata | Input from previous kernel's return data ($C_i$) |
secondary_calldata | Input from previous app's return data ($C'_i$) |
return_data | Output to be consumed by next circuit ($R_i$) |
App circuits only produce return_data (no calldata). Kernel circuits receive both:
calldata from the previous kernel's return datasecondary_calldata from the corresponding app's return dataValues are read from databus columns using a log-derivative lookup argument. This allows dynamic indexing—circuit logic can access databus[witness_index] where the index itself is a witness value.
For a bus column $b$ with read counts $a$, read selector $q_{busread}$, and wires $(w_1, w_2)$ representing (value, index), the lookup identity is:
$$\sum_{i=0}^{n-1}\frac{a_i}{b_i + i\beta + \gamma} - \frac{q_{busread,i}}{w_{1,i} + w_{2,i}\beta + \gamma} = 0$$
In practice, we precompute an inverse polynomial $I$ where:
$$I_i = \frac{1}{(b_i + i\beta + \gamma)(w_{1,i} + w_{2,i}\beta + \gamma)}$$
This allows expressing the lookup as two subrelations:
$$I_i \cdot (b_i + i\beta + \gamma)(w_{1,i} + w_{2,i}\beta + \gamma) - \varepsilon_i = 0$$
$$\sum_{i=0}^{n-1} a_i \cdot I_i \cdot (w_{1,i} + w_{2,i}\beta + \gamma) - q_{busread,i} \cdot I_i \cdot (b_i + i\beta + \gamma) = 0$$
The is_active flag $\varepsilon$ indicates when the inverse $I$ needs to be computed. It is given by the expression q_busread OR read_tags = q_busread + read_tags - q_busread * read_tags. Third subrelation read_tags * read_tags = read_tags ensures read_tags entries are boolean, which is required for the OR formula to work correctly.
Multiple columns: Each bus column (calldata, secondary_calldata, return_data) has separate subrelations, distinguished by column-specific selectors $q_j$.
The databus columns are populated from ACIR constraints generated by the Noir compiler. When a Noir program uses call_data or return_data intrinsics, the compiler generates BlockConstraint operations that:
set_values()Notation: πᵢ denotes the proof of folding the i-th kernel, π'ᵢ denotes the proof of folding the i-th app, and PI denotes public inputs.
The key insight: circuit Kᵢ₊₁ verifies the data transfer between Kᵢ₋₁ and Kᵢ. It has access to [Rᵢ₋₁] through public inputs and [Cᵢ] through the proof πᵢ.
Kernel K₀ (first kernel):
Kernel K₁:
Kernel Kᵢ (general case, i ≥ 2):
Tail Kernel Kₙ₋₁:
PrivateToRollupKernelCircuitPublicInputs containing final accumulated dataHiding Kernel Kₙ:
call_data)HN_FINAL)PrivateToRollupKernelCircuitPublicInputs as its public outputThis protocol ensures that data passed between circuits is consistent without requiring the verifier to see or hash the actual data—only commitment equality checks on O(1)-sized commitments.
Chonk Proof Verification:
There are two verification paths:
Chonk::verify / bb verify --scheme chonk):PrivateTxBaseRollup: For private-only txs - verifies Chonk proof + processes tx (updates trees, validates fees, etc.)PublicChonkVerifier: For public txs - verifies Chonk proof in parallel with AVM verificationBoth verify: the hiding kernel's MegaZK proof + Goblin proof (merge, ECCVM, translator). Output: PrivateToRollupKernelCircuitPublicInputs consumed by the rollup.
A Chonk proof must reveal nothing about the private execution. ZK is achieved through multiple layers:
The op queue contains EC operations from all circuits and must be hidden:
hide_op_queue_accumulation_result**: Hides the final accumulator pointhide_op_queue_content_in_tail**: Protects tail kernel op queue datahide_op_queue_content_in_hiding**: Final ZK protection in Hiding kernelProblem: The final merge step uses APPEND mode. If the merged table size varied with transaction complexity, an observer could infer information about the transaction from the proof structure.
Solution: We always merge to a uniform total size = OP_QUEUE_SIZE. In the code, shift_size is set to (OP_QUEUE_SIZE - |hiding_ops|) × NUM_ROWS_PER_OP, which represents the total degree of the prepended table and places the hiding kernel's ops at fixed positions at the end of the table, regardless of how many ops the actual transaction used.
Security - Zero Padding is Enforced: The soundness argument has two parts:
See Appendix: Zero Padding Security for the detailed M_tail lifecycle and full soundness argument.
The hiding kernel:
op_queue dataFor each circuit (alternating app/kernel):
During accumulation:
Kernels must call complete_kernel_circuit_logic after adding user logic:
This adds:
| Class | Description |
|---|---|
HypernovaFoldingProver | Prover-side folding operations |
HypernovaFoldingVerifier<Flavor> | Verifier-side folding (native or recursive) |
Each circuit is converted to claims via Sumcheck:
r = (r₀, r₁, ..., rₙ₋₁)NUM_ALL_ENTITIES entities at rNUM_ALL_ENTITIES evaluation claimsFor MegaFlavor: NUM_ALL_ENTITIES = 60 evaluations (55 unshifted + 5 shifted).
The individual evaluation claims are batched using random linear combinations:
1. Generate batching scalars:
NUM_UNSHIFTED_ENTITIES and $N_s$ = NUM_SHIFTED_ENTITIES, and $\rho_i$, $\sigma_i$ are transcript challenges.2. Batch polynomials:
$$p_{\text{unshifted}} = p_0 + \sum_{i=1}^{N_u-1} \rho_i \cdot p_i$$
$$p_{\text{shifted}} = p_0 + \sum_{j=1}^{N_s-1} \sigma_j \cdot p_j$$
3. Batch evaluations:
$$v_{\text{unshifted}} = p_0(r) + \sum_{i=1}^{N_u-1} \rho_i \cdot p_i(r)$$
$$v_{\text{shifted}} = p_{0,\text{shifted}}(r) + \sum_{j=1}^{N_s-1} \sigma_j \cdot p_{j,\text{shifted}}(r)$$
4. Batch commitments:
$$[p_{\text{unshifted}}] = [p_0] + \sum_{i=1}^{N_u-1} \rho_i \cdot [p_i]$$
$$[p_{\text{shifted}}] = [p_0] + \sum_{j=1}^{N_s-1} \sigma_j \cdot [p_j]$$
The resulting accumulator contains $(r, v_{\text{unshifted}}, v_{\text{shifted}}, [p_{\text{unshifted}}], [p_{\text{shifted}}])$.
Prover Accumulator (MultilinearBatchingProverClaim):
Verifier Accumulator (MultilinearBatchingVerifierClaim):
1. Instance to Accumulator - Convert a circuit instance into an initial accumulator:
2. Fold - Combine an accumulator with a new instance:
The fold operation:
MultilinearBatchingProver to fold:batched_unshifted_accumulator, batched_unshifted_instance - batched polynomialsbatched_shifted_accumulator, batched_shifted_instance - batched shifted polynomialseq_accumulator = $\text{eq}(X, r_{\text{acc}})$ - selects accumulator evaluation pointeq_instance = $\text{eq}(X, r_{\text{inst}})$ - selects instance evaluation point