Reputation: 5597
We planning to move some of writes our back-end will do from RDBMS to NoSQL, as we expect them to be the main bottleneck.
Our business process has 95%-99% concurrent writes, and only concurrent 1%-5% reads on average. There will be a massive amount of data involved, so in-memory NoSQL DB won't fit.
What NoSQL DB on-disk would be optimal for this case?
Thanks!
Upvotes: 1
Views: 5522
Reputation: 96
If the concurrent writes are creating conflicts and data integrity is an issue, NoSQL isn't probably your way to go. You can easily test this with a data management that supports "optimistic concurrency" as then you can measure the real life locking conflicts and analyze them in details.
I am a little bit surprised when you say that you expect problems" without any further details. Let me give you one answer: Based on the facts you gave us. What is 100,000 sources and what is the writing scenario? MySQl is not best example of handling scalable concurrent writes etc.
It would be helpful if you'd provide some kind of use case(s) or anything helping to understand the problem in details.
Let me take two examples: In memory database having an advanced write dispatcher, data versioning etc, can easily take 1M "writers" the writers being network elements and the application an advanced NMS system. Lots of writes, no conflicts, optimistic concurrency, in-memory write buffering up to 16GB's, async parallel writing to 200+ virtual spindles (SSD or magnetic disks) etc. A real "sucker" to eat new data! An excellent candidate to scale the performance to its limits.
2nd example: MSC having a sparse number space, e.g. mobile numbers being "clusters" of numbers. Huge number space, but max. 200M individual addresses. Very rare situations where there are conflicting writes. RDBMS was replaced with memory mapped sparse files. And the performance improvement was close to 1000x, yes 1000x in best case, and "only" 100x in worst case. The replacement code was roughly 300 lines of C. That was a True BigNoSQL, as it was a good fit to the problem to be solved.
So, in short, without knowing more details, there is no "silver bullet" to answer your question. We're not after warewolves here, it's just "big bad data". When we don't know if your workload is "transactional" aka. number or IO's and latency sensitive, or "BLOB like" aka. streaming media, geodata etc, it would give 100% wrong results to promise anything. Bandwidth and io-rate/latency/transactions are more or less a trade-off in real life.
See for example http://publib.boulder.ibm.com/infocenter/soliddb/v6r3/index.jsp?topic=/com.ibm.swg.im.soliddb.sql.doc/doc/pessimistic.vs.optimistic.concurrency.control.html for some furher details.
Upvotes: 3