Reputation: 1451
I work for an early-stage startup and currently we are in the process of choosing a hosted MongoDB service. Our requirement is fairly simple, we need a 'medium' sized Mongo server where a daily job will import around 100K JSON objects (with additional reporting).Inspired by these snippets(link), I have written a simple benchmark harness (link here). But to my surprise, the results are quite slow (roughly 2 seconds /insert using a hosted MongoDB service). I am sure, something is wrong with my implementation. Can anyone of you help me out? How do you benchmark Mongo INSERT operation with Ruby.
Upvotes: 1
Views: 478
Reputation: 74630
Your code looks fine, and runs fine on a modest mongod VM
2 [email protected] / 512MiB / VDI file on spindle disk.
counter: 20 timer: 0.17177268
counter: 40 timer: 0.169776753
counter: 60 timer: 0.170003466...
To demonstrate the impact network latency has on these results, add a latency of 100ms to the mongo servers interface.
sudo tc qdisc add dev eth1 root handle 1:0 netem delay 100ms
This results in times around 2 seconds for each batch:
counter: 20 timer: 2.17708192
counter: 40 timer: 2.080139019
counter: 60 timer: 2.074216244
One other thing to check is if the service you are testing runs a server version earlier than 2.6. The initialize_ordered_bulk_op
method will not do what you expect on < 2.6. The code will still function, but each insert
will require a round trip to the mongo instance. This is probably not the case as you would need a return trip of about 5ms to get your ~ 2 second results. You can turn on logging for the MongoClient to check just in case...
require 'logger'
logger = Logger.new(STDOUT)
db = MongoClient.new(HOST, PORT, { :logger => logger ).db(DBNAME)
If each of your 20 inserts has a log line rather than one for the batch then you need to look for another service.
MONGODB (XX.Xms) dummyDB['dummyDB.dummyCollection'].insert...
In any case network latency will play a large part in your system if you're using an externally hosted db.
The round trip time from your infrastructure to the Mongo service will add that overhead to every transaction time. So a single threaded test with small batches will be impacted in overall time a lot more than a multi threaded test with huge batches.
Using your test setup values and building in a 100ms round trip as an example
20000 inserts / 20 per batch * 100ms delay = 100 seconds total.
Reworking your code a little allows you to see the a difference various bulk op sizes make at 100ms latency.
Bulk Insert batch of [5]
counter[500] timer[10.2571]
counter[1000] timer[10.2656]
Bulk Insert batch of [20]
counter[500] timer[2.6140]
counter[1000] timer[2.6152]
Bulk Insert batch of [50]
counter[500] timer[1.0550]
counter[1000] timer[1.0543]
Bulk Insert batch of [100]
counter[500] timer[0.5396]
counter[1000] timer[0.5380]
Bulk Insert batch of [500]
counter[500] timer[0.3282]
counter[1000] timer[0.2300]
Along with your mongodb benchmarking it's a good idea to collect latency information for any of the services you are considering. You can use something like mtr
to monitor the network or simply pull the data per query from the MongoClient logger over a longer period. Spikes and variances in latency are probably worse than a slightly slower consistent value as they add unpredictability to your system.
Upvotes: 2