whysoserious
whysoserious

Reputation: 728

Creating index takes very long time

I created a collection in MongoDB consisting of 11446615 documents.

Each document has the following form:

{ 
 "_id" : ObjectId("4e03dec7c3c365f574820835"), 
 "httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&b=1", 
 "words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"],     
 "howMany" : 3 
}

httpReferer: just an url

words: words parsed from the url above. Size of the list is between 15 and 90.

I am planning to use this database to obtain list of webpages which have similar content.

I 'll by querying this collection using words field so I created (or rather started creating) index on this field:

db.my_coll.ensureIndex({words: 1})

I started creating index about 3 hours ago and it doesn't seem like it could finish in another 3 hours.

How can I increase speed of indexing? Or maybe I should use completely another approach to this problem? Any ideas are welcome :)

Upvotes: 18

Views: 28434

Answers (4)

Big Sam
Big Sam

Reputation: 1388

In 2024, there are two great options for creating large indexes in MongoDB replica sets and sharded clusters. Option #1 is widely recommended, however the index I had to create was going to take around 10 days per node running the database from a traditional HDD. This is not viable for large replica sets/sharded clusters, so Option #2 was much faster for me

Option #1 - Rolling Index Build

Option #2 - Create the index on one node (per replica set), backup the database files, and then copy the database backup to all the other node(s). You will just want to make sure the Oplog is large enough to capture the time window it takes to fully transfer the database backup. The index build will also complete order(s) of magnitude faster if you create the index on a node with the database running on a SSD or NVMe

Upvotes: 0

I also faced same issue, resolved through a work around with the following steps.

  1. rename the collection you are trying to create index.

db.<existing_collection>.renameCollection("<rename_collection>");

  1. execute the index query.

db.<existing_collection>.createIndex({<option_goes_here>});

  1. Transfer data from rename_collection to existing_collection.

db.<rename_collection>.find().forEach(function(doc){db.<existing_collection>.insert(doc);});

  1. remove the renamed collection collection

db.<rename_collection>.drop();

Upvotes: 0

Dharshan
Dharshan

Reputation: 404

Background indexes also have some issues.

  1. If anything it should take longer due to the load on your server.
  2. If interrupted for some reason it will restart as a foreground build

If you have a replica set I prefer to do a "rolling index build".

  1. Take a secondary out of replica set
  2. Build index
  3. Insert secondary back into replica set

I think this is the cleanest solution.

Upvotes: 4

Andz
Andz

Reputation: 2258

Nope, indexing is slow for large collections. You can create the indexing in the background as well:

db.my_coll.ensureIndex({words:1}, {background:true});

Creating the index in the background will be slower and result in a larger index. However, it won't be used until the indexing is complete, so in the meantime you'll be able to use the database normally and the indexing won't block.

Upvotes: 24

Related Questions