Reputation: 39
I am volunteering for a non-profit, building a pretty data intensive app. This is my first-ever time using Firestore, so I'm still learning best-practices and optimization strategies. I'd like to reduce infrastructure costs as much as possible. Ideally, I'd be able to cut costs down to the free tier, since the app doesn't have a stable source of funding, but I realize that may not be a realistic goal.
I have built the naive MVP version of this app, and it costs more than I feel like it should. I am completely willing to over-engineer the solution. I am even willing to switch the database to a different technology at this point.
Currently, I have a few collections that have ~10k documents each. Each document is basically just a key:value pair. In order to load the app, I need all (or most) of that data. The website gets maybe 500 visits per day, so I'm racking up a bit more than I'd like in read costs.
The data in the biggest collection only gets updated about once a day, with acceptable latency of a few days, so a good caching strategy would be very welcome.
Here's the main issue though: Most of the advice I've seen says to prefer using many small documents over fewer documents with more data. Why is this the case? I'm wondering if in my case it would be a better idea to use a few large documents with (not quite) as much data as will fit. Specifically, I would have n=10 or so documents in each collection that represents the data in all 10k documents. I could determine which document each key:value pair goes in by hashing the key and modding that by n. Then, when I get data, I would just be reading 10 docs. What are the downsides to this? Is there a better approach? Basically, I'm looking for other perspectives than my own so I don't get stuck with a worse app later on.
Upvotes: 0
Views: 46
Reputation: 598603
The problem starts here:
Currently, I have a few collections that have ~10k documents each. Each document is basically just a key:value pair. In order to load the app, I need all (or most) of that data.
Any time a single page/screen of your app needs to load more than a few dozens of documents, you should re-consider your data model and/or the choice for a NoSQL database. Keep in mind: each document is essentially a file, and hope you'd frown at opening tens of thousands of files for each individual page view.
In NoSQL databases you should model the data for the use-cases of your app.
If your app needs 10k key-value pairs, store all those pair in a single document (or a few documents). That way each view takes only one or a few document reads. This may mean that you need to remodel the data and/or store duplicated data, which complicates the writing of data. But in NoSQL this is exactly the trade-off you're supposed to make: make your reads as cheap as possible both in $$$ as in other resource usage, even if that makes your write operations more complex.
If you're new to this sort of consideration, I recommend reading NoSQL Data Modeling Techniques and watching Get to know Firestore.
Upvotes: 1