Som Poddar
Som Poddar

Reputation: 1451

MongoDB benchmark with Ruby

I work for an early-stage startup and currently we are in the process of choosing a hosted MongoDB service. Our requirement is fairly simple, we need a 'medium' sized Mongo server where a daily job will import around 100K JSON objects (with additional reporting).Inspired by these snippets(link), I have written a simple benchmark harness (link here). But to my surprise, the results are quite slow (roughly 2 seconds /insert using a hosted MongoDB service). I am sure, something is wrong with my implementation. Can anyone of you help me out? How do you benchmark Mongo INSERT operation with Ruby.

Upvotes: 1

Views: 478

Answers (1)

Matt
Matt

Reputation: 74630

Your code looks fine, and runs fine on a modest mongod VM
2 [email protected] / 512MiB / VDI file on spindle disk.

counter: 20 timer: 0.17177268
counter: 40 timer: 0.169776753
counter: 60 timer: 0.170003466...

To demonstrate the impact network latency has on these results, add a latency of 100ms to the mongo servers interface.

sudo tc qdisc add dev eth1 root handle 1:0 netem delay 100ms

This results in times around 2 seconds for each batch:

counter: 20 timer: 2.17708192
counter: 40 timer: 2.080139019
counter: 60 timer: 2.074216244


Mongo version < 2.6

One other thing to check is if the service you are testing runs a server version earlier than 2.6. The initialize_ordered_bulk_op method will not do what you expect on < 2.6. The code will still function, but each insert will require a round trip to the mongo instance. This is probably not the case as you would need a return trip of about 5ms to get your ~ 2 second results. You can turn on logging for the MongoClient to check just in case...

require 'logger'
logger = Logger.new(STDOUT)
db = MongoClient.new(HOST, PORT, { :logger => logger ).db(DBNAME)

If each of your 20 inserts has a log line rather than one for the batch then you need to look for another service.

MONGODB (XX.Xms) dummyDB['dummyDB.dummyCollection'].insert...


Latency is bad

In any case network latency will play a large part in your system if you're using an externally hosted db.

The round trip time from your infrastructure to the Mongo service will add that overhead to every transaction time. So a single threaded test with small batches will be impacted in overall time a lot more than a multi threaded test with huge batches.

Using your test setup values and building in a 100ms round trip as an example

20000 inserts / 20 per batch * 100ms delay = 100 seconds total.   

Reworking your code a little allows you to see the a difference various bulk op sizes make at 100ms latency.

Bulk Insert batch of [5]
counter[500] timer[10.2571]
counter[1000] timer[10.2656]

Bulk Insert batch of [20]
counter[500] timer[2.6140]
counter[1000] timer[2.6152]

Bulk Insert batch of [50]
counter[500] timer[1.0550]
counter[1000] timer[1.0543]

Bulk Insert batch of [100]
counter[500] timer[0.5396]
counter[1000] timer[0.5380]

Bulk Insert batch of [500]
counter[500] timer[0.3282]
counter[1000] timer[0.2300]

Along with your mongodb benchmarking it's a good idea to collect latency information for any of the services you are considering. You can use something like mtr to monitor the network or simply pull the data per query from the MongoClient logger over a longer period. Spikes and variances in latency are probably worse than a slightly slower consistent value as they add unpredictability to your system.

Upvotes: 2

Related Questions