llrs
llrs

Reputation: 3397

Creating index while updating the documents

I have a collection I am updating adding a new field. The document looks like:

{"A": "P145", "B":"adf", "C":[{"df":"14", "color":"blue"},{"df":17}], 
     "_id":ObjectID(....), "Synonyms":{"Synonym1": "value1", 
          "Synonym2": ["value1", "value2"]}}

In the update I am adding new elements to C

I want to create a index on the field A and B. A and B are 20206 unique fields. The queries to the database will be based on these fields. The "_id" is set by default.

I plan to do it with collection.ensure_index({"A":1, "B":1}, background=True)

How much time could it need? It will be faster than the system index based on "_id"?

Upvotes: 0

Views: 1029

Answers (2)

Sachin Shukla
Sachin Shukla

Reputation: 171

Well, the time taken to create the index totally depends on the hardware (system) you are using and the number of records. for ~20K records it should be quick and not take more time. max few seconds in worst case. Little off topic, but i see that you have given background true option, probably its not needed as these background options are used while create a very large data set.Please consider few things while creating index, not only for this question but in general.

  1. when you create Index in foreground they block the operation and wouldn't allow the read operation and that the reason background true is used. http://docs.mongodb.org/v2.2/administration/indexes/
  2. the good part with foreground index creation is that the indexes are more compact and better compare to background. hence it should be preferred.
  3. The good news is over a long run, both background index creation and foreground delivers the same performance and does'nt matter which way the indexes were created. ... Happy Mongoing.. ;-)

-$

Upvotes: 0

Mark Hughes
Mark Hughes

Reputation: 7374

The amount of time it takes to add the index would depend on your hardware, but with 20206 records a simple index as you describe shouldn't take very long for most hardware.

Queries fully covered by the index (i.e. where you specify A and B, or just A, but not just B - indexes cover from left to right so unless you include A in the select, the index can't be used) will be much faster to retrieve the results. Unless you are searching by _id, the default index on _id won't help you at all; queries on A and B will have to perform a full collection scan without your proposed index, which is orders of magnitude slower than an index scan.

Inserts will be slightly slower as the index will need to be updated too, but again with a relatively small number of total documents, this isn't likely be a large overhead.

The updates to change the C collection may well be faster if you are using A and B to identify which document to update, as they will benefit from the faster search, and the update should not be impacted once the data is found as the index should not need changing.

As the absolute performance will be specific to your hardware, if you're concerned about it the best thing to do is try it out on a copy of the data (on similar hardware) and measure whether the performance meets your needs. The output from explaining the query can be very informative in understanding how your indexes are impacting your query performance.

Upvotes: 2

Related Questions