Reputation: 42267
I'm looking for tips on how to improve the database performance in the following situation.
As a sample application, I wrote a fairly simple app today that uses the Twitter streaming API to search for certain keywords, then I am storing the results in MongoDB. The app is written with Node.js.
I'm storing 2 collections. One stores the keyword and an array of tweet id's that reference each tweet found mentioning that keyword. These are being added to the database using .update() with {upsert:true} so that new id's are appended to the 'ids' array.
A sample document from this collection looks like this:
{ "_id": ObjectId("4e00645ef58a7ad3fc9fd9f9"), "ids": ["id1","id2","id3"], "keyword": "#chocolate" }
Update code:
keywords.update({keyword: key_word},{$push:{ids: id}},{upsert:true}, function(err){})
The 2nd collection looks like this and are added simply by using .save()
{
"twt_id": "id1",
"tweet": { //big chunk of json that doesn't need to be shown }
}
I've got this running on my Macbook right now and its been going for about 2 hours. I'm storing a lot of data, probably several hundred documents per minute. Right now the number of objects in Mongodb is 120k+.
What I'm noticing is that the cpu usage for the database process is hitting as high as 84% and has been constantly going up gradually since I started the latest test run.
I was reading up on setting indexes, but since I'm adding documents and not running queries against them, I'm not sure if indexes will help. A side thought that occurred to me is that update() might be doing a lookup since I'm using $push and that an index might help with that.
What should I be looking at to keep MongoDB from eating up ever increasing amounts of CPU?
Upvotes: 9
Views: 17681
Reputation: 236
https://docs.mongodb.com/manual/reference/operator/update/positional/#up.S
Hope to help you!
The positional $ operator identifies an element in an array to update without explicitly specifying the position of the element in the array. https://getvideo.pro/watch/mongodb-querying-sub-documents-and-using-the-positional-operator-in-projection-vid-fEvYkBDW0Iw or: https://getvideo.pro/watch/mongodb-a-to-z-video-18-updating-multiple-documents-with-positional-operator-vid-Z2dTXbktLEQ
Upvotes: 0
Reputation: 9477
You're on the right path. The query portion of your update needs an index, else it is running a table scan. An indent index on keyword and you'll see update performance increase significantly.
Upvotes: 9
Reputation: 9210
It is very likely that you are hitting a very common bottle neck in MongoDB. Since you are updating documents very frequently by adding strings, there is a good chance that you are running out of space for that document and forcing the database to constantly move that document to a different space in memory\disk by rewriting it at the tail end of the data file.
Adding indexes can only hurt write performance so that will not help improve performance unless you are read heavy.
I would consider changing your application logic to do this:
ids
property by adding a whole bunch of fake strings in the array. Then immediately after inserting it, remove all of the id's from that array. This will cause mongodb to allocate additional room for that entire document so that when you start adding id's to the ids field, it will have plenty of room to grow.ids
fieldUpvotes: 12