lzl124631x
lzl124631x

Reputation: 4809

How to paginate among millions of data in firestore?

Background

Our blockchain has tens of shards, each of which contains millions of blocks. Each block contains shardID, height, timestamp fields

I currently store all the blocks in the same collection called blocks because I want to sort all the blocks across all the shards altogether. I used to store blocks of each shard in its corresponding shardBlocks collection, but I didn't figure out how to sort across collections.

I created a compound index on fields shardID and height.

{
  "collectionGroup": "blocks",
  "queryScope": "COLLECTION",
  "fields": [
    { "fieldPath": "shardID", "order": "ASCENDING" },
    { "fieldPath": "height", "order": "DESCENDING" }
  ]
}

Issue

I'm using the following code to paginate the blocks which I learned from the firestore example

        let query = await this.blocksCollection.orderBy("timestamp", "desc");

        let start = pageIndex * pageSize;
        if (start) {
            let a = Date.now();
            let skip = await this.blocksCol
                .orderBy("timestamp", "desc")
                .limit(start)
                .get();
            let prev = skip.docs[skip.docs.length - 1];
            query = query.startAfter(prev);
        }

        let snapshot = await query.limit(pageSize).get();
        return snapshot.docs.map(d => d.data()) as Block[];

But it easily got error Bandwidth exhausted. And previously I remember I've seen error message saying the limit is at max 10000.

Question

I tried that, if I know the timestamp of the first block in the batch, I can use startAt or startAfter to get that batch and it's very fast. But I don't know the aforementioned timestamp :(

Upvotes: 2

Views: 236

Answers (1)

Thingamajig
Thingamajig

Reputation: 4465

Is this perhaps running loads and loads of times, until limit is ridiculously high? You're currently using limit(start) which seems like it could be a culprit of far too many reads being triggered. I would use a static number for your limit and go from there.

If your pageSize is staying constant and pageIndex is increasing, you're likely grabbing the first few documents many times over, or after each page, you're increasing the size of your calls.

For example:

1st page, pageIndex = 0, pageSize = 25, start = 0. This wouldn't load anything.

2nd page, pageIndex = 1, pageSize = 25, start = 25. This would load 25 docs.

3rd page, pageIndex = 2, pageSize = 25, start = 50. This would load 50 docs including the docs you already loaded from the 2nd page (meaning you're 25 docs redundant now).

Upvotes: 1

Related Questions