Jed Watson
Jed Watson

Reputation: 20378

What are the performance implications of running db.copyDatabase() against a production mongo database?

I recently launched a web app that hasn't seen much production scale yet, but I expect (hope ;) it will in the near future.

I have found the ability to use db.copyDatabase() extremely useful to copy a snapshot of the current production system into development and am wondering what kind of issues I may run into as the production database grows / is put under heavier load.

The docs don't appear to indicate that the command is blocking (specifically there is a reference to the dataset becoming out of sync if data is added to either database while the command is running).

Since the db is being copied to a dev (or staging) server, time taken to rebuild indexes / etc will not be a big issue (at least for a while).

The docs are a bit light on guidelines in this case, so I'm hoping to get advice on:

For reference, the app and database are hosted separately (heroku / mongolab). I'm also running db.dropDatabase() locally before the copyDatabase() command to get a completely fresh db.

Upvotes: 1

Views: 1283

Answers (2)

angela
angela

Reputation: 155

Not sure if you know, but you can schedule one-time or recurring backups through MongoLab's web interface. These backups can go to your own custom cloud storage container (e.g. Amazon S3) or you can choose to have MongoLab store it in one of its cloud storage containers.

These backups are binary dumps (taken via MongoDB's mongodump tool), and you can download them straight from MongoLab's UI.

We replicate all of our databases on shared instances and make every effort to take backups off of secondaries to minimize load on primaries (backups can be pretty resource intensive).

Hope that helps.

Upvotes: 7

Sammaye
Sammaye

Reputation: 43884

This answer will end up being a bit subjective since we are not on your hardware etc.

Is it appropriate to run db.copyDatabase to copy from a live database in production?

A binary backup might be a better option here: docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/

Considering that it is basically a "copy" of the database using full table scans it will have the same effect as doing the exact same from your application really. It could cause a temporary excessive working set and possibly even cause swap within the computers LRU should your data not nessecarily fit into your RAM.

It can be the case quite often that your working set does not represent how much it will cost to actually bring out all the data and since Virtual Memory (which mmap goes to) is not RAM you might find that it doesn't fit.

Aside from the RAM problems you could get read lock problems depending many, many, factors. Basically something to bare in mind there.

I am sure there are more problems I haven't listed.

However it should be good to mention that most of these problems exist on a very large dataset.

Is there a practical limit to the size past which it stops being feasible?

It all depends on how long you are prepared to wait for data and how much of your working set your server(s) can handle but I would probably go with the scenario of the linked question and say that 100GB is a good limit to go by.

Upvotes: 3

Related Questions