Reputation: 23
We are using a cloud function to remove all data older than 6 months from our firestore. Unfortunately this ends up reaching a timeout. We have created code based on: https://firebase.google.com/docs/firestore/manage-data/delete-data
We are retrieving the collection, which we need to loop over, using listDocuments()
. We can't use get()
in our case, as it will not return all the documents. We have documents that have been created without explicitely creating the path towards it.
This was our first hurdle actually as the cloud function reached the timeout on that function. Updating our cloud function to the latest version (code changes) [https://github.com/googleapis/nodejs-firestore/issues/825] & increasing the timeout to 300 seconds managed to resolve the problem.
We are however now reaching timeouts on the deletion actions. We have noticed that deletions are really slow on large collections, for instance trying to delete 10 documents on a collection of 2000 documents is slower than deleting 200 documents from a collection of 210 documents. Each of these deletions can take from a few milliseconds (for small collections) to almost 3 seconds (for large collections). Because batch actions are limited to a max. of 500 [https://firebase.google.com/docs/firestore/manage-data/transactions], we end up getting multiple 3 second deletions eventually reaching the timeout.
Steps we have taken to solve the problem:
listDocuments()
Upvotes: 2
Views: 947
Reputation: 1872
Firestore is not meant for this kind of processes. I recommend taking a look to Bigtable. However, if you need to use Firestore, I propose the following scheme.
This will ensure that the timeout error will not appear, and you will only be billed for the time the instance is on.
Upvotes: 1
Reputation: 317372
I would suggest reading over the Firestore best practices documentation. In particular, pay attention to the part that mentions "hotspotting":
Avoid high read or write rates to lexicographically close documents, or your application will experience contention errors. This issue is known as hotspotting, and your application can experience hotspotting if it does any of the following:
Creates new documents at a very high rate and allocates its own monotonically increasing IDs.
Cloud Firestore allocates document IDs using a scatter algorithm. You should not encounter hotspotting on writes if you create new documents using automatic document IDs.
Creates new documents at a high rate in a collection with few documents.
Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
Deletes documents in a collection at a high rate.
Writes to the database at a very high rate without gradually increasing traffic.
You might be able to improve performance if you randomize the document deletes so that they aren't "sequential" from the point of view of Firestore's internal sharding. If you can effectively parallelize the deletes across more shards, you could see a performance boost.
Upvotes: 1