Reputation: 487
I have got a MongoDB which store product data for 204.639.403 items, those data has already spitted up, by the item's country, into four logical databases running on the same physical machine in the same MongoDB process.
Here is a list with the number of documents per logical database:
My problem is that the database write performance is getting worser, especially writes to the largest of the four databases (De) has become really bad, according to iotop
the mongod process uses 99% of the IO time with less than 3MB writes and 1.5MB reads per second. This leads to long locking databases, 100%+ lock become normally according to mongostat
- even if all processes writing and reading to the other country databases has been stopped. The current slave reaches a LOAD up to 6, the replica set master has a load of 2-3 at the same time, therefore it leads to a replication lag, too.
Each databases has the same data and index structure, I am using the largest database (De) for further examples only.
This is a random item taken from the database, just as example, the structure is optimized to gather all important data with a single read:
{
"_id" : ObjectId("533b675dba0e381ecf4daa86"),
"ProductId" : "XGW1-E002F-DW",
"Title" : "Sample item",
"OfferNew" : {
"Count" : 7,
"LowestPrice" : 2631,
"OfferCondition" : "NEW"
},
"Country" : "de",
"ImageUrl" : "http://….jpg",
"OfferHistoryNew" : [
…
{
"Date" : ISODate("2014-06-01T23:22:10.940+02:00"),
"Value" : {
"Count" : 10,
"LowestPrice" : 2171,
"OfferCondition" : "NEW"
}
}
],
"Processed" : ISODate("2014-06-09T23:22:10.940+02:00"),
"Eans" : [
"9781241461959"
],
"OfferUsed" : {
"Count" : 1,
"LowestPrice" : 5660,
"OfferCondition" : "USED"
},
"Categories" : [
NumberLong(186606),
NumberLong(541686),
NumberLong(288100),
NumberLong(143),
NumberLong(15777241)
]
}
Typical querys are form simple one like by the ProductId or an EAN only to refinements by the category and sorted by its A rank or refinements by the category and an A rank range (1 up to 10.000 for example) and sorted by the B rank… .
This are the stats from the largest db:
{
"ns" : "De.Item",
"count" : 61216165,
"size" : 43915150656,
"avgObjSize" : 717,
"storageSize" : 45795192544,
"numExtents" : 42,
"nindexes" : 6,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 41356824320,
"indexSizes" : {
"_id_" : 2544027808,
"RankA_1" : 1718096464,
"Categories_1_RankA_1_RankB_-1" : 16383534832,
"Eans_1" : 2846073776,
"Categories_1_RankA_-1" : 15115290064,
"ProductId_1" : 2749801376
},
"ok" : 1
}
It is mentionable that the index size is nearly half of the storage size.
Each country DB has to handle 3-5 million updates/inserts per day, my target is to perform the write operations in less than five hours during the night.
Currently it's a replica set with two servers, each has 32GB RAM and a RAID1 with 2TB HDDs. Simple optimizations like the deadlock scheduler and noatime has already been made.
I have worked out some optimizations strategies:
But there should be other optimization strategies, too did not comes to my mind I would like to hear about!
Which optimization strategy sound most promising or is a mixture of several optimizations is needed here?
Upvotes: 27
Views: 42739
Reputation: 2978
Would you consider using a database with better throughput that supports documents? I've heard success stories with TokuMX. And FoundationDB (where I'm an engineer) has very good performance with high-concurrent write loads and large documents. Happy to answer further questions about FoundationDB.
Upvotes: 0
Reputation: 351
Most likely you are running into issues due to record growth, see http://docs.mongodb.org/manual/core/write-performance/#document-growth.
Mongo prefers records of fixed (or at least bounded) size. Increasing the record size beyond the pre-allocated storage will cause the document to be moved to another location on disk, multiplying your I/O with each write. Consider pre-allocating "enough" space for your average document on insert, if your document sizes are relatively homogenous. Otherwise consider splitting rapidly growing nested arrays into a separate collection, thereby replacing updates with inserts. Also check your fragmentation and consider compacting your databases from time to time, so that you have a higher density of documents per block which will cut down on hard page faults.
Upvotes: 20