Mongodb performance degrades significantly over time with upsert.

Question

I'm using Mongodb as a cache right now. The application will be fed with 3 CSVs over night and the CSVs get bigger because new products will be added all the time. Right now, I'm reached 5 million records and it took about 2 hours to process everything. As the cache is refreshed everyday it'll become impractical to refresh the data.

For example

CSV 1
ID, NAME
1, NAME!

CSV 2
ID, DESCRIPTION
1, DESC

CSV 3
ID, SOMETHING_ELSE
1, SOMETHING_ELSE

The application will read CSV 1 and put it in the database. Then CSV 2 will be read if there're new information it will be added to the same document or create a new record. The same logic applies for CSV 3. So, one document will get different attributes from different CSVs hence the upsert. After everything is done then all the documents will be indexes.

Right now the first 1 million documents is relatively quick, but I can see the performance degrades considerably over time. I'm guessing it's because of the upsert as Mongodb has to find the document and update the attributes otherwise create it. I'm using Java Driver and MongoDB 2.4. Is there anyway I could improve or even do batch upsert in mongodb java driver?

Mongodb performance degrades significantly over time with upsert.

Answers (1)

Related Questions