HiChews123
HiChews123

Reputation: 1686

Couchbase - Bulk insert via Java SDK - How to scale with Akka?

I'm tasked with scaling a system for which I need to do a high volume of inserts into the Couchbase server. I'm using Couchbase Server 2.5 by the way, and Couchbase Java Client 1.4.4

I expect to receive around 100K messages from a message queue, and I'm pulling them off and then persisting these messages into Couchbase as fast as possible. I intend to introduce concurrency by leveraging a concurrency framework like Akka. I intend to spawn up new actors for every message and persisting, so at any given point in time it's theoretically possible I'll have > 100K actors live in the system all concurrently trying to persist the message via the Couchbase client.

A few questions:

  1. How should I think about resource contention here? Assuming a 4 core machine, theoretically only 4 writes can happen in a truly parallel way.
  2. Let's assume that my Couchbase cluster is running on excellent hardware and should be able to scale to >100K requests very quickly. My theory is that I may run into bottlenecks on the client side, and if I do ...
  3. How should I scale my client to be able to do that many (or more writes), without timing out? Is there any way to tune my thread pools on the client side?
  4. Lastly, should I introduce some way of throttling my writes in such a way to offload "pressure" on the couchcbase client?
  5. What else am I missing when thinking about how to scale this properly, and gracefully, free from unexpected errors/resource leaks?

Thank you!

Upvotes: 0

Views: 1756

Answers (1)

David Ostrovsky
David Ostrovsky

Reputation: 2481

First of all, you should really switch to the new Java client - 2.1.2 at the time of this writing. It's faster, has fewer dependencies, and makes it much easier to reason about concurrency.

  1. That's not exactly true, because the client buffers operations internally and has several IO threads processing the work queue in bulk. So you are not limited to the number of cores in concurrent writes. Take a look at the Java bulk insert example here: http://docs.couchbase.com/developer/java-2.1/documents-bulk.html
  2. You may run into a CPU bottleneck on the client side, especially if you're serializing objects. There's no magic solution here, you'll need to scale it out.
  3. While you can configure the client worker thread count and various other parameters, you should first try it with the default settings. If you see something that's not working, you can try to diagnose and adjust later, but generally the default settings are good for most cases. Also, because the client buffers operations, there is usually no need to do any manual client pooling. You'll want to be careful about creating too many instances of the client; for example, you definitely shoudn't create an instance per actor in your case. You can find documentation for all the advanced config options here: http://docs.couchbase.com/developer/java-2.1/env-config.html
  4. The 2.x Java client actually throws BackpressureException in high load and failure scenarios. You'll need to handle it, possibly with some sort of exponential back-off retry.
  5. Switch to the 2.x client version, read all the docs (http://docs.couchbase.com/developer/java-2.1/java-intro.html) :)

Upvotes: 2

Related Questions