Title: The Evolution of Data Access in Web3
Authors: Geng Kai, Eric, DFG
The Importance of Data in Blockchain
Data is crucial to blockchain technology and serves as the foundation for developing decentralized applications (dApps). While much of the current discussion revolves around data availability (DA) – ensuring that every network participant can access recent transaction data for verification – there is another equally important aspect that is often overlooked: data accessibility.
In the era of modular blockchains, DA solutions have become indispensable. These solutions ensure that all participants can access transaction data to achieve real-time verification and maintain the integrity of the network. However, the functionality of the DA layer is more like a billboard than a database. This means that data is not stored indefinitely; it is deleted over time, just like how a new poster replaces an old one on a billboard.
On the other hand, data accessibility focuses on the ability to retrieve historical data, which is crucial for dApp development and blockchain analysis. This aspect is vital for tasks that require access to past data to ensure accurate representation and execution. While data accessibility is essential, it is often discussed less but is equally important as data availability. Both play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues to support robust and efficient blockchain applications.
Retrieving Blockchain Data Previously
Since its inception, blockchain has fundamentally transformed infrastructure and driven the creation of decentralized applications (dApps) in various fields such as gaming, finance, and social networking. However, building these dApps requires access to a significant amount of blockchain data, which is both challenging and costly.
For dApp developers, one option is to host and run their own archival RPC nodes. These nodes store all historical blockchain data from the beginning, allowing full access to the data. However, maintaining archival nodes is costly, and their query capabilities are limited, making it challenging to query data in the format developers need. While running cheaper nodes is an option, their data retrieval capabilities are limited, which may hinder the operation of dApps.
Another approach is to use commercial RPC node providers. These providers are responsible for the cost and management of nodes and provide data through RPC endpoints. Public RPC endpoints are free but have rate limits, which may negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires a lot of back-and-forth communication, making their requests cumbersome and inefficient for complex data queries. Furthermore, private RPC endpoints are often difficult to scale and lack compatibility across different networks.
A Better Alternative: Blockchain Indexers
Blockchain indexers play a crucial role in organizing on-chain data and sending it to a database for query purposes, which is why they are often referred to as the “Google of blockchain.” They work by indexing blockchain data and making it available at all times through a query language similar to SQL (using APIs like GraphQL). By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, thus simplifying the process significantly.
Different types of indexers optimize data retrieval in various ways:
1. Full Node Indexers: These indexers run full blockchain nodes and extract data directly from them, ensuring data integrity and accuracy but requiring significant storage and processing power.
2. Lightweight Indexers: These indexers rely on full nodes to fetch specific data as needed, reducing storage requirements but potentially increasing query times.
3. Specialized Indexers: These indexers are tailored for certain types of data or specific blockchains, optimizing retrieval for specific use cases, such as NFT data or DeFi transactions.
4. Aggregate Indexers: These indexers extract data from multiple blockchains and sources, including off-chain information, providing a unified query interface, which is particularly useful for multi-chain dApps.
Just Ethereum alone requires 3TB of storage space, and as the blockchain continues to grow, the data storage volume of Erigon archival nodes will also increase. Indexer protocols deploy multiple indexers that efficiently index and query large amounts of data at high speed, something RPCs cannot achieve.
Indexers also allow for complex queries, easy data filtering based on different criteria, and data analysis extraction. Some indexers even enable aggregation of data from multiple sources, avoiding the need to deploy multiple APIs in multi-chain dApps. By being distributed across multiple nodes, indexers provide enhanced security and performance, while RPC providers may face disruptions and downtime due to their centralized nature.
Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while also reducing the cost of deploying individual nodes. This makes blockchain indexer protocols the preferred choice for dApp developers.
Use Cases for Indexers
As mentioned earlier, building dApps requires retrieving and reading blockchain data to operate their services. This includes any type of dApp, including DeFi, NFT platforms, games, and even social networks, as these platforms need to read data before executing other transactions.
DeFi
DeFi protocols require different information to quote specific prices, ratios, fees, etc. Automated market makers (AMMs) need price and liquidity information about certain pools to calculate swap rates, while lending protocols need utilization to determine lending rates and debt ratios for liquidation. Inputting information into their dApps before calculating the rates users execute is essential.
Games
GameFi needs fast indexing and access to data to ensure users can play games smoothly. Only through lightning-fast data retrieval and execution can Web3 games compete with Web2 games in terms of performance, attracting more users. These games require data on land ownership, in-game token balances, in-game operations, etc. Using indexers, they can better ensure stable data flow and stable uptime to ensure a perfect gaming experience.
NFT
NFT markets and lending platforms need indexed data access to various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid the need to browse each NFT individually to find ownership or NFT attribute data.
Whether it’s an AMM in DeFi that needs price and liquidity information or a SocialFi application that needs to update new user posts, the ability to retrieve data quickly is crucial for the smooth operation of dApps. With indexers, they can efficiently and accurately retrieve data, providing a seamless user experience.
Analytics
Indexers provide a way to extract specific data from raw blockchain data (including smart contract events in each block). This provides an opportunity for more specific data analysis, offering comprehensive insights.
For example, a perpetual trading protocol can identify which tokens have high trading volumes, which tokens generate fees, enabling them to decide whether to list these tokens as perpetual contracts on their platform. DEX developers can create dashboards for their products to gain deep insights into which pools have the highest returns or strongest liquidity. Public dashboards can also be created, allowing developers to freely query any type of data to display on charts.
With multiple blockchain indexers available, identifying the differences between indexing protocols is crucial to ensuring developers choose the indexer that best suits their needs.
Overview of Blockchain Indexers
Indexer Overview: The Graph
The Graph is the first indexer protocol launched on Ethereum, making it easy to query transaction data that was previously difficult to access. It uses subgraphs to define and filter subsets of data collected from the blockchain, allowing for easy querying using a query language similar to SQL.When it comes to all transactions related to the Uniswap v3 USDC/ETH pool, users rely on indexing proof to stake the native token GRT for indexing and querying services. Delegators have the option to stake their tokens here. Curators can access high-quality subgraphs to assist indexers in determining which subgraphs to index for optimal query fees. As The Graph transitions towards greater decentralization, it will eventually cease its hosting services and require subgraphs to upgrade to its network while offering incentives to upgrade indexers.
The infrastructure enables the average cost per million queries to reach $40, which is significantly lower than the cost of self-hosted nodes. With support for file data sources, it also allows for parallel indexing of on-chain and off-chain data to facilitate efficient data retrieval.
Taking a look at The Graph’s indexer rewards, they have been steadily increasing over the past few quarters. This is partly due to the increase in query volume, but also attributed to the growth in token prices as they plan to integrate AI-assisted queries in the future.
Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates a large amount of on-chain and off-chain data, protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from specific block subsets, speeding up the data retrieval process by quickly identifying nodes that store the required data.
Subsquid also supports real-time indexing, allowing for indexing before blocks are finalized. It also supports storing data in formats chosen by developers, making it easier to analyze using tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling codeless deployment.
Despite still being in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, deployment of over 60,000 Squid indexers, and over 20,000 verified developers on the network. Recently, on June 3rd, Subsquid launched its mainnet data lake.
In addition to indexing, the Subsquid Network data lake can also substitute RPC in use cases like analytics, ZK/TEE co-processors, AI agents, and Oracles.
SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. Initially supporting Polkadot and Substrate networks, it has now expanded to include over 200 chains. Similar to The Graph using indexing proof, indexers index data and provide query requests, with delegators staking shares to indexers. However, it introduces consumers to submit purchase orders to secure indexer income, rather than managers.
It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, optimizing query efficiency while moving towards greater decentralization. Users can choose to pay approximately 1 SQT token per 1000 requests or set custom fees for indexers through the protocol.
Although SubQuery only launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value, indicating a growing number of query services provided on its platform. Since TGE, the total staked SQT has increased from 6 million to 125 million, highlighting the growth in network participation.
Covalent is a decentralized indexer network where Block Sample Producers (BSP) network nodes create copies of blockchain data through batch exports and publish proofs on the Covalent L1 blockchain. This data is then refined by Block Result Producer (BRP) nodes according to set rules to filter out the required data.
Developers can easily extract relevant blockchain data in a consistent request and response format using a unified API without the need for custom complex queries. They can use CQT tokens settled on Moonbeam as a payment method to extract these pre-configured datasets from network operators.
Covalent’s rewards seem to be overall trending upwards from Q1 23 to Q1 24, partly due to the increase in the Covalent token CQT price. Considerations when choosing an indexer include:
– Customizability of data: Some indexers like Covalent are general indexers providing standard pre-configured datasets only through API. While they may be fast, they lack flexibility for developers requiring custom datasets. Indexer frameworks allow for more custom data processing to meet application-specific needs.
– Security: Indexed data must be secure, or dApps built on these indexers are vulnerable to attacks. While all indexers adopt some form of security through indexer staking tokens, other indexer solutions may use proofs to further enhance security.
– Speed and scalability: As blockchain grows, transaction volume increases, making indexing a large amount of data more challenging. Indexer protocols introduce solutions to meet these growing demands efficiently.
– Supported networks: While most blockchain activities still occur within Ethereum, other blockchains are becoming more popular over time, requiring indexing services.
In conclusion, while indexers are widely adopted in dApp development, their potential remains significant, especially when integrated with AI. As AI becomes more prevalent in both Web2 and Web3, its improvement relies on access to relevant data for training models and developing AI agents. Ensuring data integrity is crucial for AI applications to prevent biased or inaccurate information input.
Subsquid has made significant progress in performance and user metrics in the indexer solutions field. Users are starting to experiment with building AI agents using Subsquid, demonstrating the versatility and potential of the platform in the evolving data indexing landscape. Additionally, tools like AutoAgora help indexers use AI to provide dynamic pricing for query services on The Graph, while SubQuery supports multiple AI networks to achieve transparent data indexing.
The integration of AI with indexers is expected to enhance data accessibility and usability in the blockchain ecosystem. By leveraging AI technology, indexers can provide more efficient and accurate data retrieval, enabling developers to build more complex dApps and analytical tools. As AI and indexers continue to evolve together, we remain optimistic about the future of data indexing and its role in shaping the decentralized digital landscape.