VinayBS
VinayBS

Reputation: 409

Using Couchbase for large data sets

I am evaluating NoSQL databases for my project. Below are the requirements.

  1. We will have huge datasets of around 600 GB distributed across different nodes in a cluster.
  2. We need around 1k read operations per second.
  3. We are looking for highly available, fault tolerant, self healing solution.

I zeroed in in Cassandra and Couchbase and then choose Couchbase based on the below factors:

  1. Couchbase read performance is better than Cassandra.
  2. Cluster management is better in Couchbase.

My question is, will Couchbase be able to handle huge datasets? I am not able to find much information online on this.

Upvotes: 3

Views: 1090

Answers (2)

a4xrbj1
a4xrbj1

Reputation: 425

First of all, 600 GB was once considered a huge dataset but isn't anymore.

I handle Telecommunications data (Call Detail Records) which are roughly 12 billion records per month. With a node.js program as the back-end doing some serious operations with the data for my new Loyalty program I'm able to run it on my 2013 MacBook Pro (with 16 GB though Couchbase takes only a small part of it/SSD) between 1000-1200 calls per second.

That means between 1000-1200 reach the node.js program, leading to more read & writes against the Couchbase database (version 2.x still BTW). In between there are periods where Couchbase server goes down to zero transactions as I'm feeding the data from the same MBP and pushing it to my app is slower than Couchbase & Node.js can run.

So it's not necessary to run it on many nodes with the setup you're aiming for and Couchbase scales linear way beyond what other NoSQL can do. They have two whitepaper on this, showing that MongoDB and Cassandra runs out at 8k (MongoDB) and 12k (Cassandra) transactions per second whilst Couchbase goes on strong.

The one called "Benchmarking Couchbase Server for Interactive Applications" by Altoros Systems shows on reads (latency) against throughput that Cassandra is starting at 2ms (for 1k reads), goes up to 4ms from 7k reads to 10k reads and ends the test at 12k reads with 6ms.

Couchbase on the other hand is below 1ms till 16k reads and only then is showing slow down, getting to 1.5ms at 20k reads and 2.5ms at 21k where the chart ends.

MongoDB isn't even in the same league in this comparison as Cassandra or Couchbase. You will find all details as to the testing setup in the whitepaper.

The other whitepaper is called "Comparing Couchbase Server 3.0.2 with MongoDB 3.0: Benchmark Results and Analysis" and is from Avalon Consulting LLC. It's more recent and compares the back then lastest versions (especially the new MongoDB version).

To quote from it: "Couchbase Server provided 4x better read latency than MongoDB with the same number of concurrent clients - 245. Like throughput, concurrency is important. MongoDB latency increased by over 50% as the number of concurrent clients was increased by 50%. However, Couchbase Server latency increased by much smaller margins - as little as 10%."

PM me if you can't find it online, I can email both whitepapers to you. I've researched it as part of my decision making process which NoSQL solution was the right one for my use case.

Disclaimer: I'm not affiliated with any of the companies mentioned above, I'm just an user.

Upvotes: 4

hookenz
hookenz

Reputation: 38889

Absolutely, Couchbase stores it's data what it calls buckets.

According to: http://docs.couchbase.com/admin/admin/Misc/limits.html The max bucket size is unlimited

There have been questions related to whether your data can exceed the memory you assign to the bucket. Yes it can.

Upvotes: 2

Related Questions