Geuis
Geuis

Reputation: 42267

How to improve performance of update() and save() in MongoDB?

I'm looking for tips on how to improve the database performance in the following situation.

As a sample application, I wrote a fairly simple app today that uses the Twitter streaming API to search for certain keywords, then I am storing the results in MongoDB. The app is written with Node.js.

I'm storing 2 collections. One stores the keyword and an array of tweet id's that reference each tweet found mentioning that keyword. These are being added to the database using .update() with {upsert:true} so that new id's are appended to the 'ids' array.

A sample document from this collection looks like this:

{ "_id": ObjectId("4e00645ef58a7ad3fc9fd9f9"), "ids": ["id1","id2","id3"], "keyword": "#chocolate" }

Update code:

 keywords.update({keyword: key_word},{$push:{ids: id}},{upsert:true}, function(err){})

The 2nd collection looks like this and are added simply by using .save()

 {
     "twt_id": "id1",
     "tweet": { //big chunk of json that doesn't need to be shown }
 }

I've got this running on my Macbook right now and its been going for about 2 hours. I'm storing a lot of data, probably several hundred documents per minute. Right now the number of objects in Mongodb is 120k+.

What I'm noticing is that the cpu usage for the database process is hitting as high as 84% and has been constantly going up gradually since I started the latest test run.

I was reading up on setting indexes, but since I'm adding documents and not running queries against them, I'm not sure if indexes will help. A side thought that occurred to me is that update() might be doing a lookup since I'm using $push and that an index might help with that.

What should I be looking at to keep MongoDB from eating up ever increasing amounts of CPU?

Upvotes: 9

Views: 17681

Answers (3)

Brendan W. McAdams
Brendan W. McAdams

Reputation: 9477

You're on the right path. The query portion of your update needs an index, else it is running a table scan. An indent index on keyword and you'll see update performance increase significantly.

Upvotes: 9

Bryan Migliorisi
Bryan Migliorisi

Reputation: 9210

It is very likely that you are hitting a very common bottle neck in MongoDB. Since you are updating documents very frequently by adding strings, there is a good chance that you are running out of space for that document and forcing the database to constantly move that document to a different space in memory\disk by rewriting it at the tail end of the data file.

Adding indexes can only hurt write performance so that will not help improve performance unless you are read heavy.

I would consider changing your application logic to do this:

  1. Index on the keyword field
  2. Before inserting anything into the database each time you detect a tweet, query for the document which contains the keyword. If it does not exist, insert a new document but pad the ids property by adding a whole bunch of fake strings in the array. Then immediately after inserting it, remove all of the id's from that array. This will cause mongodb to allocate additional room for that entire document so that when you start adding id's to the ids field, it will have plenty of room to grow.
  3. Insert the id of the tweet into the ids field

Upvotes: 12

Related Questions