

Table of Contents
1. Introduction
2. History and Evolution of Ethereum Sharding
2-1. Early Ethereum sharding efforts: Hypercubes, Hub and Spoke Chains, Super Quadratic Sharding, Quadratic Sharding
2-2. The pursuit of simplicity and pragmatism: Full execution sharding → Data sharding
3. Danksharding
3-1. Danksharding, a new blockchain architecture for data sharding
3-2. Minimizing centralization: PBS (Proposer Builder Separation) and crList
3-3. Ensuring scalability and trust: DAS, Erasure Coding, and KZG Commitments
4. EIP-4844: Proto-Danksharding
4-1. EIP-4844, the cornerstone of Danksharding
4-2. Simply reducing calldata cost can cause block size issues
4-3. Structure and creation of blob transactions
5. Impact of EIP-4844 on Rollup Costs
5-1. DA (L1 publication) costs currently account for over 90% of total rollup costs
5-2. With the implementation of EIP4844, the DA cost of rollups is expected to be almost free, and a significant increase in the blob fee would require the demand for rollups to grow by more than 10 times
5-3. Changing the cost structure of blobs is being actively discussed within the Ethereum community
6. Closing Thoughts
Annotation) Types of Ethereum data storage
A1. The main storage spaces in EVM are categorized as Storage, Memory, Stack, and Calldata
A2. The space used by rollups is called Calldata
1. Introduction
Alphas are hidden in EIPs. In a month before and after the EIP-1559 (London), EIP-3675 (The Merge), and EIP-4895 (Shanghai) upgrades, ETH prices rose 10% to 80%. Now, the next upgrade is the Deneb-Cancun-combined Dencun hard fork, scheduled for the end of this year. And the most noted upgrade in Dencun is the EIP-4844, a.k.a. Proto-Danksharding, the very first step toward implementing Ethereum’s sharding roadmap and a proposal that will dramatically reduce the operational cost of rollups.
This report is set up in two parts. The first follows in the footsteps of Ethereum sharding, exploring how the history of changes in sharding and Ethereum’s long-term vision of Danksharding. And in the second part, we'll look at the structure and implications of Proto-Danksharding and envision how the economic structure of rollups will change after EIP-4844.
2. History and Evolution of Ethereum Sharding
2-1. Early Ethereum sharding efforts: Hypercubes, Hub and Spoke Chains, Super Quadratic Sharding, Quadratic Sharding
In retrospect, the scalability solutions proposed by the nascent Ethereum community such as Hypercubes, Super Quadratic Sharding, and Hub and Spoke Chains were notably audacious and ahead of their time. In particular, hypercube and the hub and spoke chains were proposed by Vitalik in late 2014, before the Ethereum mainnet was even launched. (see “Scalability, Part 2: Hypercubes”). While hub and spoke chains were an early form of today’s Polkadot Relay Chain-Parachain structure, hypercube emerged as an answer to hub and spoke chains; flaws. Hypercube's advantage is improved transaction speed via cross-substate messaging. The model unfortunately was not adopted due to its many attack vectors and complex implementation, but these ideas were later vital in shaping the quadratic sharding* model that appeared on the ETH2 roadmap. Sharding on Ethereum has since undergone three major shifts that made the sharding model we know today.
*Quadratic sharding: The blockchain is separated into a beacon chain and a chain of 64 shards, and each shard processes transactions in parallel and shoots headers to the beacon block. Each beacon block contains transactions for all 64 shards, and each shard block is validated by 64 committees composed of Ethereum's validators. The committees are randomly assembled through a process called random sampling (see “Sharding: The Future of the Ethereum Blockchain”).
A significant shift transpired between late 2017 and early 2018 as the roadmap for Serenity, or ETH2, started to solidify. Decisively, complex theories of super quadratic and exponential sharding were set aside in favor of first implementing quadratic sharding. Super quadratic sharding adds shards upon shards, much like fractal scaling. This redirection enabled Ethereum to concentrate its R&D efforts solely on quadratic sharding. A clear indication of this strategic shift can be found in Justin Drake's "Sharding Phase 1 Spec" from March 2018, where super quadratic sharding is slated for Phase 6, the ultimate stage.
The second pivotal shift took place in the latter half of 2019, when the development of *Crosslinks** was set aside, transitioning instead to a unilateral delivery of shard blocks to the Beacon Chain (as documented on Github). Initially, the crosslinks were designed to interconnect the Beacon and Shard chains, facilitating communication between them, as depicted in the figure below (where 'M' represents the main chain or Beacon chain, and 'S' stands for the Shard chain). However, by discarding this approach, the focus needed to be only on the transaction flow from shard chain to the Beacon chain. Simply put, the previous model required crosslinks for interchain connectivity, a requirement eliminated in the redefined structure.
*Crosslink: A signature of a beacon chain validators or committee that has validated a shard chain block, used by the beacon chain to verify the latest state of the shard chain or to interact with the shard chain.
Chain cross-linking, Source: Vitalik Buterin
The last major shift arrived in 2020 with Vitalik's unveiling of the "Rollup-centric ethereum roadmap." As the phrase "rollup-centric roadmap" implies, the role of rollups in the Ethereum ecosystem has expanded significantly ever since. The idea was to assign rollups the task of transaction execution while using shards merely for data availability. This marked Ethereum's strategic move from execution sharding to data sharding.
Ethereum Roadmap, Source: Vitalik Buterin
2-2. The pursuit of simplicity and pragmatism: Full execution sharding → Data sharding
The trajectory of Ethereum's sharding can be characterized as a persistent pursuit of reduction and simplification. From Ethereum's inception, sharding was consistently proposed as a scalability solution. However, given its technical complexity and implementation challenges, the concept of sharding gradually pivoted toward practicality and simplicity. Naturally, it evolved from full execution sharding, where each shard processed all transactions, to data sharding, where shards merely store transaction data executed by the rollups.
The revised roadmap seeks to use the shard chain solely as a data availability layer for rollups, while tasking rollups with transaction execution. The consensus layer's role here is to guarantee the data availability of shards. To effectuate this, Dankrad Feist, a researcher at the Ethereum Foundation, proposed in his December 2021 article "New sharding design with tight beacon and shard block integration," a strategy for a single block builder to generate a beacon block encapsulating all transactions from each shard. At the same time, he also introduced Proposer Builder Separation (PBS) to decouple the roles of proposer and builder, thereby mitigating centralization in the block creation process. This proposal resonated with the Ethereum community and was integrated into the official Ethereum roadmap, leading to the birth of Danksharding.
3. Danksharding
3-1. Danksharding, a new blockchain architecture for data sharding
In ETH2 sharding, Ethereum validators are randomly sampled to form committees, which take turns simultaneously creating and validating 64 shard blocks. However, the absence of data availability sampling (DAS) necessitates validators to manually store the entire shard data and provide proof of its availability. This system relies on the assumption that all nodes are honest, as there is no way to verify if a particular validator deliberately withholds data. Moreover, the committee must manually tally each node's vote for shard blocks, which can cause time delays and hinder timely inclusion of shard blocks in the beacon block. As such, the earlier sharding methods were quite complex to implement and carried many risks.
In Danksharding, each node produces one large block containing all shard blobs*, which is then validated and voted on by a committee. This effectively solves the previously mentioned problem of shard blocks not being promptly included in beacon blocks. Centralization of the block production process arising from the structure of Danksharding can be minimized through the introduction of Proposer Builder Separation (PBS). PBS is a method to encode the principle of "centralized block production, decentralized block validation" that Vitalik laid out in Endgame.
*Blob: Short for Binary Large Objects, Blob is a new data type that is stored only in the Beacon Chain. The concept of blobs was first introduced alongside Danksharding and is expected to be used primarily for storing rollup data.
3-2. Minimizing centralization: PBS (Proposer Builder Separation) and crList
Initially outlined in "Proposer/Block Builder Separation-friendly Fee Market Designs," PBS was primarily developed to decentralize MEV. However, its design and principles meshed well with the Danksharding architecture, leading to its subsequent integration within that structure.
MEVs are predominantly leveraged by a small number of powerful nodes that have the HW to update the mempool swiftly and resources to develop advanced MEV algorithms. Basically, PBS revolves around the idea of splitting the role of a block producer into a proposer and a builder, offering a chance for nodes with lower computing power and fewer resources to obtain MEV. Builders are powerful validators capable of developing advanced MEV algorithms and responsible for ordering transactions and creating blocks. Proposers, on the other hand, have the power to decide which of the blocks proposed by builders will be recorded on the chain.
With PBS, even the most computationally powerful nodes will have to share MEV gains with proposers. This stands in contrast to the previous process, whereas previously a single validator had a monopoly on transaction inclusion and block generation. In this process, the proposer receives the bid amount, and the builder receives the MEV, minus transaction fees and the bid amount.
It's important to note that during the bidding process, builders publish only the block header and the desired bid price first, instead of the entire block data. Releasing the full block data upfront could allow other builders to copy the transaction and take the MEV. This is why the full block data is only released after the proposer has selected the block.
Two-slot PBS architecture | Source: Two-slot proposer/builder separation
One problem with PBS is that it delegates block creation authority to builders, potentially compromising censorship-resistance. To address this, the Ethereum community is planning to introduce the Censorship Resistance List (CrList). Once CrList is implemented, the transaction flow is expected to be as follows:
- A proposer broadcasts a crList containing all valid transactions in their mempool to a builder.
- The builder constructs a block based on the transactions in the crList and submits it to the proposer. In doing so, the builder includes a transaction hash in the block body that proves that it contains all the transactions from the crList.
- The proposer selects the block with the highest bid, constructs the block header, and notifies the nodes.
- The builder submits a block and a proof that it has included all transactions from the crList.
- The block is added to the chain. If the builder does not submit a proof, the block will not be accepted by the fork choice rule.
3-3. Ensuring scalability and trust: DAS, Erasure Coding, and KZG Commitments
In addition to PBS and crList, the secure and efficient implementation of Danksharding requires the preemptive introduction of several other technologies, most notably: 1) Data Availability Sampling (DAS), 2) Erasure Coding, and 3) KZG Commitment.
First, DAS addresses the potential centralization and scalability issues that can arise with the introduction of Danksharding. With the growing adoption of Ethereum, not only the number of rollups but also the amount of data these rollups need to process will grow, resulting in an exponential increase in the amount of data nodes need to store. This hinders scalability and leads to centralization as only a few nodes capable of handling such rapidly increasing data volume will survive (also known as the data availability problem). DAS ensures data availability (DA) and reduces the burden of large data on nodes, encouraging more nodes to join the network. This is also why it is more secure and reliable than other decentralized file systems, e.g., BitTorrent and IPFS, which facilitate data upload but do not guarantee DA.
With DAS, nodes only verify if at least 50% of the data in blobs are available without downloading all the data. DA is guaranteed if the availability exceeds 50%, the secret of which is erasure coding using the Reed-Solomon Codes. The essence of erasure coding is that even if 50% of the original data is lost, the entire data can be recovered using the remaining 50% of the data by doubling the data in a certain way based on the original data. Vitalik Buterin writes in An explanation of the sharding + DAS proposal that the simplest mathematical analogy to understand erasure coding works is the idea that “two points are always enough to recover a line.” For example, if a file consists of four points (1, 4), (2, 7), (3, 10), and (4, 13) on one line, then any two of those points can be used to reconstruct the line and compute the remaining two points. This assumes that the x coordinates 1, 2, 3, 4 are fixed parameters of the system and are not the file creator’s choice. Extending this idea and using higher degree polynomials, 3-of-6 files, 4-of-8 files, and generally n-of-2n files for arbitrary n can be created. These files have the property that even with arbitrary n points, the remaining missing points out of 2n can be computed. That is, even if only 50% of the erasure-coded data remains, the entire data can be reconstructed. However, there must be safeguards in place to ensure that erasure coding is done properly. If you fill a blob with garbage data instead of the original data, the block will be irrecoverable. This is where the KZG commitment comes into play.
Erasure coding mechanism | Source: An explanation of the sharding + DAS proposal
Similar to the proof systems in rollups, there are two ways to verify erasure coding: fraud proof and zero-knowledge proof. While Celestia, a layer-1 blockchain built with the Cosmos SDK that provides data availability, will use fraud proofs to ensure the integrity of erasure-coded data, the Ethereum community has decided to use a zero-knowledge proof system, specifically the 2D KZG commitment. With 2D KZG commitment, the entire data can be recovered with a probability of 1 in 75 iterations of DAS. However, like SNARKs, it requires an initial trusted setup process, making the solution less than perfect. KZG commitments rely on a randomized value called a CRS (Common Reference String), determined during the initial setup phase. If a prover or proof creator randomly assigns this value, they can create a fraud proof that shows a true result even if they don't actually have the value. As a result, some in the Ethereum community are proposing using 2D KZG commitments for now and switching to STARK proofs in the future. For more details, see KZG Polynomial Commitments by Dankrad Feist.
4. EIP-4844: Proto-Danksharding
Though the exact specification of Danksharding has yet to be discussed further, and its actual deployment is expected to be years away, the Ethereum community has made the proactive decision to incorporate some of the parameters of Danksharding, such as EIP-4844, into the protocol during the upcoming Dencun upgrade.
4-1. EIP-4844, the cornerstone of Danksharding
Despite its name, Proto-Danksharding, EIP-4844 does not actually shard the Ethereum database. Instead, EIP-4844 aims to 1) pre-introduce some of the logic required for a smooth transition of the protocol architecture to Danksharding in the future, and 2) introduce blob carrying transactions.
Blobs, the centerpiece of EIP-4844, stand for Binary Large Objects. Put simply, they are chunks of data attached to a transaction. Unlike regular transactions, blob data is only stored on the Beacon Chain and incurs very low gas fees. The purpose of blobs is to dramatically reduce the cost of DA (L1 publication) of rollups by creating a storage space exclusively dedicated to data availability, independent of blockspace. Currently, all rollups use the calldata space to write their data to Ethereum, which is expected to be replaced with blobs. For more information about Ethereum's data storage and why rollups currently use calldata, see annotation below.
4-2. Simply reducing calldata cost can cause block size issues
The idea of reducing the DA cost of rollups has been tossed around since before EIP-4844. Initially, it aimed to simply reduce the calldata fee (gas fee), but it was quickly dismissed due to block size issues. Suppose we want to reduce calldata gas fees to one-tenth. Currently, the average block size on Ethereum today is 120KB, with a theoretical maximum size of around 1.8MB (assuming all 30M gas is used for calldata). If we were to reduce the cost of calldata by a factor of 10, the average block size would still be manageable, but the maximum block size would be 18MB, far beyond what the network can handle. In other words, simply reducing the cost of calldata gas can seriously bloat the network.
Source: etherscan
A refinement of the above idea is EIP4488, which was proposed in November 2021. EIP4488 aims to reduce the calldata gas cost from 16 gas per byte to 3 gas per byte, while enforcing a maximum block size of 1.4MB.
The main concept behind EIP4488 is to boost rollup usage by lowering calldata costs by 5.3 times, while simultaneously avoiding excessive block size growth through a hard cap on the maximum block size.
EIP-4844, on the other hand, creates a new data type called blobs and introduces a fee market for them, which helps lower the DA (l1 publication) cost of rollups. While the implementation of EIP-4844 may take longer than EIP4488, which has an intuitive and simple logic facilitating a faster deployment process, EIP-4844 provides the advantage of introducing necessary updates for Danksharding in advance. Upon passing EIP-4844, execution clients will be ready for Danksharding, requiring only consensus clients to upgrade over time—which is another reason EIP-4844 was adopted.
4-3. Structure and creation of blob transactions
Now, let's delve deeper into blobs. Each blob consists of 4,096 fields, with each field containing 32 bytes, resulting in a blob size of 125KB*. According to the updated EIP-4844 document, an average of 3 blobs will be added to each block, with a maximum of 6 blobs, suggesting that the blob size per block will be between 375KB and 750KB. Upon the introduction of Danksharding, the target or maximum number of blobs will increase to 128 and 256, respectively, allowing for a blob space of 16MB to 32MB.
*In fact, 125KB is in no way a small amount of data. Assuming 1 byte per character and an average of 6 characters per word, 125KB is enough to hold about 21,000 English words. According to ChatGPT, 21,845 English words would fill about 77 A4 sheets (font Arial, size 10, line spacing 2).