Reputation: 1737
Let's say I have a collection with tens of thousands of documents, and I'm storing the entire collection in the cache on the client to avoid making tons of unnecessary reads each time the app is opened. To get the latest content, the client just needs to query for all documents with a last_modified
time after the previous fetch. This works great when creating or updating documents in the collection, but what happens when you want to delete something? If one client removes it from the database then the other clients will never realize that it's missing, and if you add a deleted
field then it'll tie up storage space forever on the server and every single client that's ever fetched it.
What techniques are there to deal with this kind of thing? I'm open to making major changes to the architecture I've described if it's necessary to solve this problem, so I'd prefer an answer of "start over and do it like this" to "it's impossible". I've considered reusing deleted documents when new content is added instead of creating new ones, but that still means the database can never shrink.
In summary, the requirements for my app are:
Edit: The method I'm using for improving query performance is from the docs, at the bottom of this page. I'm making a time tracker, where each activity is a very lightweight document (<1kb) and a user might log up to 10K activities per year. Each user only sees their own activities, but they might want to do it on multiple devices. Keeping the entire history on each device makes it very easy to calculate statistics on the fly without having to think about any kinds of costs, and it's small enough that I don't need to worry about space. I've considered trying to optimize by combining activities into larger documents or by pre-calculating the aggregations ahead of time, but that feels a lot more like premature optimization than this caching strategy would be.
Upvotes: 1
Views: 172
Reputation: 599601
Micro-optimizing the API calls for document reads is typically not the best approach to using Firestore.
To get the latest content, the client just needs to query for all documents with a
last_modified
time after the previous fetch.
If all documents are already in the local cache, you gain nothing by doing a query like this.
Say you have your 1,000 documents already cached, and one was added or updated. Doing your query results in one document being read from the server, but so does simply reading the entire collection.
A much better question here is why you are reading and caching 1,000 documents to begin with? Is the average user of your app really going to see all those 1,000 documents? It seems like an awful lot of information to show on a single screen.
It might be better to only load the first screenful of information, and then load the rest on-demand.
You could also consider if you really need the entire documents. For example: if you only show the headlines of each document, could you maybe combine the 1,000 headlines into a single document (say called recent_headlines
) and load that instead.
Finally, consider using data bundles to bundle updates together, serve them from a CDN and reduce the number of document reads you're charged for.
Upvotes: 2