Reputation: 1332
I have a MongoDB database containing millions of tweets, and it appears that to improve query performance, I will need to add one or more additional compound indexes. There are already individual indexes on the fields that will go into the compound index. Before I just go adding these indexes, I want to have some idea of what the impact will be, both short term (is it going to take days to add the initial index?) and long term (is it going to make inserts enough slower that it outweighs the benefit of speeding up the queries?)
This is not a production database - no replicas, no shards. It lives on a low-end linux box which doesn't run much, if anything, else. As I said, millions of tweets. The relevant fields are userid and tweeted_at (a date). There's little enough free space on the machine that I don't have anywhere to put a copy of my db, so trying it out on a copy isn't really helpful. I don't expect exact answers, just some guidance.
Upvotes: 2
Views: 1218
Reputation: 111
Your database having millions of tweets is not necessarily an issue. As Markus said, it is possible it could bring your querying to a halt, in particular if you aren't sharded.
The most important additional thing that I have experienced (working with 10s and 100s of millions of records) is not the number of records as much as the size of those records. When I'm dealing with TBs worth of data over a million records it is considerably slower than 100GBs worth of data over 100 million records.
Actual mileage will vary, but I would assume that a database of tweets will have small enough records that indexing should be relatively short-lived, and it shouldn't add a meaningful amount of time to document creation.
Upvotes: 0
Reputation: 20703
{background:true}
option. However, this is kind of a Very Bad Idea™ as it slows down index creation considerablymongos
on the same machine - my heavily loaded MacBook Air, compiling a Spring application the same time) and creating a compound index on 2 values in 1M docs took roughly 16 seconds. Even when we assume that each mongod
used one CPU and the processing was done in parallel, the disk would have been the limiting factor. I think it is save to assume that we are rather talking of minutes than of hours, let alone days.Edit: To speed up things, you might want to remount the according partition with the noatime
flag set.
Upvotes: 2