Reputation: 996
What is the best way to implement a commenting system (huge data writing)?
1) Use a RDBMS database such as MySQL, 2 tables one for the topics and one for the comments Pros is that the insertion of new comment is fast, efficient and simple, efficient indexing. Cons is that scaling out (horizontal scaling) is hard.
2) Use a nosql database such as couchdb or mongodb, Pros is that scaling out (horizontal scaling) is easy, Supports huge data writes, schemaless Cons I think that the insertion of new data is not fast and efficient as the RDBMS
For example to update couchdb document you need to grab the whole document, update it locally the submit it again, and the document size will be huge so it will consume bandwidth.
Also I think that couchdb in-place updates, Mongodb updates would be slow and won't be efficient as in RDBMS
Also when you want to get the comments of the each user in various topics I think the search would be faster in RDBMS than in the nosql system.
That is a sample of couchdb database document [document sample for each topic]
{"_id":"doc id",
"_rev":"45521231465421"
"topic_title":"the title of the topic"
"topic_body":"the body of the topic"
"comments":[
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla1"}, {"user":"user1"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla2"}, {"user":"user2"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla3"}, {"user":"user3"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla4"}, {"user":"user4"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla5"}, {"user":"user5"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla6"}, {"user":"user6"}
]
}
Upvotes: 2
Views: 3302
Reputation: 43884
I think that the insertion of new data is not fast and efficient as the RDBMS
You have hit something there. Insertion speed of NoSQL databases relies upon your scenario. I cannot make that clear enough, so many people expect MongoDB to just perform magically faster than SQL and are sorely disappointed when it does not for them, in fact before now the mongodb-user Google group has been filled with such people.
For example to update couchdb
Not only that but CouchDB also uses versioning and JSON which is not as efficient as storing it in SQL and will consume more space per record.
Mongodb updates would be slow and won't be efficient as in RDBMS
Schema, Queries, Schema, Queries...
That is what it comes down to. Ask yourself one question.
Will I be expecting a lot of comments per post?
If so the in-memory (yes, in-memory) $push
, $pull
and other subdocument operators may get slow on a large subdocument (let's be honest, will).
Not only that but consistently growing documents can be a problem and can cause heavy fragmentation and space usage, creating a "swiss cheese" effect slowing your system down massively (bringing it to a grinding halt). This presentation should help understand more about how storage really works: http://www.10gen.com/presentations/storage-engine-internals
So you already know that, if used wrong, subdocuments can be a bad idea. That being said you could partially remedy it with power of 2 sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes but if you are getting way too many comment insertions then it won't help too much.
I personally would not embed this relationship.
So I would go for the same set-up as a RDBMS and now you start to see the problem. Insertions will probably be about the same speed if it wasn't for MongoDBs fsync queue, unlike SQL which writes straight to disk. You can set-up MongoDB with journalled writes but then you will probably get the same performance metrics from SQL at the end of the day.
As for querying, this is where MongoDB can still come out on top, providing your working set fits into RAM. I cannot bold that last bit enough!!
Unlike SQL, MongoDB maps everything (your entire data) to virtual memory, not RAM and definitely not to be confused with RAM. This does make it faster for larger lookups, for smaller lookups the speed will be about the same because both will be serving from in-memory cache.
Also when you want to get the comments of the each user in various topics I think the search would be faster in RDBMS than in the nosql system.
If the topic id is in the comment document it would definitely be faster in MongoDB, providing your working set is ready in RAM.
What is meant by the working set? Here is a good answer: What does it mean to fit "working set" into RAM for MongoDB?
Hope this helps,
Upvotes: 5
Reputation: 27526
I can speak only about MongoDB and you are indeed wrong about inserts. Here is nice comparison of Mongo with MSSQL and Mongo is performing 100x times better then MSSQL. So it's quite suitable for large data processing.
Searching is also much more faster (what would be the whole point of NoSQL if inserting and searching wouldn't be faster?) - but with one caveat, you can't perform joins in queries, you have to join tables manually in your application (but there is recommanded workaround - nested documents).
Upvotes: 2