What are the performance implications of running db.copyDatabase() against a production mongo database?

Question

I recently launched a web app that hasn't seen much production scale yet, but I expect (hope ;) it will in the near future.

I have found the ability to use db.copyDatabase() extremely useful to copy a snapshot of the current production system into development and am wondering what kind of issues I may run into as the production database grows / is put under heavier load.

The docs don't appear to indicate that the command is blocking (specifically there is a reference to the dataset becoming out of sync if data is added to either database while the command is running).

Since the db is being copied to a dev (or staging) server, time taken to rebuild indexes / etc will not be a big issue (at least for a while).

The docs are a bit light on guidelines in this case, so I'm hoping to get advice on:

Is it appropriate to run db.copyDatabase to copy from a live database in production?
Is there a performance hit on the source database?
Is there a practical limit to the size past which it stops being feasible? (based on this question here, that limit would appear to be quite large)

For reference, the app and database are hosted separately (heroku / mongolab). I'm also running db.dropDatabase() locally before the copyDatabase() command to get a completely fresh db.

Sammaye · Accepted Answer

This answer will end up being a bit subjective since we are not on your hardware etc.

Is it appropriate to run db.copyDatabase to copy from a live database in production?

A binary backup might be a better option here: docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/

Considering that it is basically a "copy" of the database using full table scans it will have the same effect as doing the exact same from your application really. It could cause a temporary excessive working set and possibly even cause swap within the computers LRU should your data not nessecarily fit into your RAM.

It can be the case quite often that your working set does not represent how much it will cost to actually bring out all the data and since Virtual Memory (which mmap goes to) is not RAM you might find that it doesn't fit.

Aside from the RAM problems you could get read lock problems depending many, many, factors. Basically something to bare in mind there.

I am sure there are more problems I haven't listed.

However it should be good to mention that most of these problems exist on a very large dataset.

Is there a practical limit to the size past which it stops being feasible?

It all depends on how long you are prepared to wait for data and how much of your working set your server(s) can handle but I would probably go with the scenario of the linked question and say that 100GB is a good limit to go by.

What are the performance implications of running db.copyDatabase() against a production mongo database?

Answers (2)

Related Questions