How Subsquid Makes Blockchain Data More Affordable

Subsquid Labs GmbH

02 Nov 2023 • 4 min read

How Subsquid Makes Blockchain Data More Affordable

For years, using an RPC endpoint has been the go-to method for fetching blockchain data for analytics and decentralized applications. However, this conventional approach comes with its own set of challenges: Free RPC endpoints, crucial for blockchain developers, are often plagued by severe rate limitations, making fast data retrieval a costly affair. Eventually, solutions like the Graph have come to appear, helping bring down the cost of the data access, however these tools are still comparatively costly and slow.

Moreover, the strain that RPC requests place on blockchain nodes—especially in networks with high transaction volumes—can impair node performance and slow down the network as a whole. The resources required for pulling large datasets through RPC endpoints are significant and can severely affect an application's responsiveness. Furthermore, reliance on specific nodes for RPC requests presents a vulnerability. If these nodes experience downtime or other issues, applications dependent solely on RPC could face interruptions.

Developers have tried to sidestep these issues by using databases, but this has been more of a palliative measure than a solution, especially concerning the retrieval of historical data.

Enter Subsquid, a decentralized Web3 data lake, that not only alleviates these issues but also revolutionizes how we access blockchain data by significantly reducing costs and increasing efficiency. In addition, the project provides developers with the open-source Squid SDK, making it even easier to interact with blockchain data.

Here's how Subsquid reshapes the landscape:

Free Access to Historical Data

Subsquid enables completely free retrieval of historical blockchain data. This completely changes the way developers have fetched blockchain data before, eliminating a significant portion of the costs. By providing unrestricted access to archival data, Subsquod lifts the financial burden and opens up new possibilities for data-heavy applications and blockchain analytics.

While the SDK is already set up to fetch archival data from the network, meaning that running a squid locally and without an RPC endpoint is completely free, it is also possible to fetch data directly from the archive network using its API. The main archive URL now points at a router that provides URLs of workers that do the heavy lifting. Each worker has its own range of blocks that it serves. The recommended data retrieval procedure is as follows:

Retrieve the archive height from the router with GET /height.
Query the archive for an URL of a worker that has the data for the first block of the relevant range with GET /${firstBlock}/worker.
Retrieve the data from the worker with POST /, making sure to set the "fromBlock" query field to ${firstBlock}.
Exclude the received blocks from the relevant range by setting firstBlock to the value of header.number of the last received block.
Repeat steps 2-4 until all the required data is retrieved.

Implementation example:

Full code is available here.

Integrated Database Service and API

Squid SDK combines flexibility and quick setup. Subsquid delivers an out-of-the-box solution by integrating a built-in database service with a GraphQL API, thus eliminating the need for complex setups. However, developers are not limited to using only this option. The storage options are highly versatile, supporting not just Postgres but also local storage and formats like CSV and Parquet, S3 buckets, or any custom solution.

Implementation example of using a local CSV table. More details here.

Full code available here.

Real-time Data Access and Advanced Indexing

Subsquid SDK supports indexing blocks before they are finalized, enabling real-time use cases. To take advantage of this feature, a squid processor must

have a node RPC endpoint as one of its data sources and
use a Database with hot blocks support (e.g. TypeormDatabase) in its processor.run() call, and
(EVM only) not set the useArchiveOnly(true) setting.

While developers can use any RPC endpoint, Subsquid Cloud Service provides an RPC service with easy setup and without rate limitations. Since the service operates on a subscription model, the cost is significantly lower compared to other services with a pay-per-query model.

To start, you only need to modify the squid manifest, enabling the service and specifying the scale according to the needs of the project.

More information about the scale and rates is available here.

This approach doesn't just cut costs; it also guarantees uninterrupted data access and guarantees validity of the data. Using an indexed database drastically lowers the cost of reading data, offering a financially sustainable solution, particularly as the user bases of decentralized applications grow.

Enhanced Flexibility and Simplified Data Transformation

Subsquid network together with the Squid SDK provides developers with the freedom to effortlessly transform and store data in their chosen formats. Unlike the rigid structures often seen in RPC data, this flexibility eases integration, boosting overall development efficiency. You can specify tables and data types when using the Parquet format, cutting out the need for complicated data transformations later on. This comes in handy when the data is used for analytics, since parquets can be queried with DuckDB and converted into Polars or Numpy data frames in one line:

More examples in our datasets repository.

To sum up, Subsquid advances the indexing landscape by offering efficient, cost-effective access to historical and real-time blockchain data, countering the limitations of methods used previously. The flexibility of the Squid SDK simplifies the integration process, making data manipulation simpler and more developer-friendly. And together with Subsquid network it provides unlimited capabilities for data fetching and manipulation while also significantly reducing costs.

Written by Daria Agadzhanova, DevRel@ Subsquid