Marcel Sufryd
Marcel Sufryd

Reputation: 21

Low CouchDB performance for randomized operations

I am benchmarking the performance of Couchdb and Mongodb using the YCSB benchmarking tool. Unfortunately it seems I am doing something wrong because the difference in performance for single, random operations is huge:

Workload A (50/50 read/update), 16 query threads, 120 sec runtime (results are very similar with 20 minute runtimes):

CouchDB 1.6.1: Overall throughput: 1076 ops/sec, 99th percentile read latency of 13ms, 99th percentile update latency of 13ms

MongoDB 3.0.6: Overall throughput: 11203 ops/sec, 99th percentile read latency of 1ms, 99th percentile update latency of 1ms

As you can see, CouchDB is terribly slow for randomized reads and updates. The documentation recommends using bulk operations which might be fine for inserts, but I do not see how I would realize bulk reads considering YCSB is asking for reads one by one.

Testing environment:

What I have tried to improve throughput:

Possible explanations for CouchDB's slowness:

Question: Do you see any other ways of improving CouchDB's performance?

Edit: Delayed_commit is set to true in couchdb, so I am starting to doubt the forced-fsync as the cause.

Upvotes: 1

Views: 706

Answers (1)

Kxepal
Kxepal

Reputation: 4679

The answer here is simple: CouchDB ensures that all writes are hits disk with fsync() call while MongoDB allows to keep them in memory for a while and tell you that everything is fine. Until next accidental shutdown when you loose your data. RAM-vs-disk is the main performance factor between them.

Next goes protocol: HTTP is text, while MongoDB uses own binary one. No need to tell, that binary protocols are more compact and efficient.

But the main problem here is that your benchmark is synthetic. You assumes that your database is used for silly read-writes, like a data bags, while databases are being used for more complex operations like queries, index lookups, joins, data validation and so on. And here business logic matters.

For more real benchmark, you should take some application and make it work with both databases and benchmark business workflow with them, not blind read/write. Pretty sure, your numbers will be equalized because business logic is much more slower than any database.

So I'm sorry that you waste your time on this.

Upvotes: 1

Related Questions