Reputation: 3397
I have a collection I am updating adding a new field. The document looks like:
{"A": "P145", "B":"adf", "C":[{"df":"14", "color":"blue"},{"df":17}],
"_id":ObjectID(....), "Synonyms":{"Synonym1": "value1",
"Synonym2": ["value1", "value2"]}}
In the update I am adding new elements to C
I want to create a index on the field A and B. A and B are 20206 unique fields. The queries to the database will be based on these fields. The "_id" is set by default.
I plan to do it with collection.ensure_index({"A":1, "B":1}, background=True)
How much time could it need? It will be faster than the system index based on "_id"?
Upvotes: 0
Views: 1029
Reputation: 171
Well, the time taken to create the index totally depends on the hardware (system) you are using and the number of records. for ~20K records it should be quick and not take more time. max few seconds in worst case. Little off topic, but i see that you have given background true option, probably its not needed as these background options are used while create a very large data set.Please consider few things while creating index, not only for this question but in general.
-$
Upvotes: 0
Reputation: 7374
The amount of time it takes to add the index would depend on your hardware, but with 20206 records a simple index as you describe shouldn't take very long for most hardware.
Queries fully covered by the index (i.e. where you specify A and B, or just A, but not just B - indexes cover from left to right so unless you include A in the select, the index can't be used) will be much faster to retrieve the results. Unless you are searching by _id, the default index on _id won't help you at all; queries on A and B will have to perform a full collection scan without your proposed index, which is orders of magnitude slower than an index scan.
Inserts will be slightly slower as the index will need to be updated too, but again with a relatively small number of total documents, this isn't likely be a large overhead.
The updates to change the C collection may well be faster if you are using A and B to identify which document to update, as they will benefit from the faster search, and the update should not be impacted once the data is found as the index should not need changing.
As the absolute performance will be specific to your hardware, if you're concerned about it the best thing to do is try it out on a copy of the data (on similar hardware) and measure whether the performance meets your needs. The output from explaining the query can be very informative in understanding how your indexes are impacting your query performance.
Upvotes: 2