mimosvk
mimosvk

Reputation: 13

MongoDB backup -> tar -> gz -> gpg

I have a MongoDB server and I am using mongodump command to create backup. I run command mongodump --out ./mongo-backup then tar -czf ./mongo-backup.tar.gz ./mongo-backup then gpg --encrypt ./mongo-backup.tar.gz > ./mongo-backup.tar.gz.gpg and send this file to backup server.

My MongoDB database has 20GB with MongoDB show dbs command, MongoDB mongodump backup directory has only 3.8GB, MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.

How is this possible to reduce 20GB database to 119MB file? Is it fault tolerant?

I tried to create new server ( clone of production ), enabled firewall to ensure that noone could connect and run this backup procedure. I create fresh new server and import data and there are some differences:

I ran same command from mongo shell use db1; db.db1_collection1.count(); and use db2; db.db2_collection1.count(); and results are:

Upvotes: 1

Views: 2201

Answers (1)

Stennie
Stennie

Reputation: 65393

If you have validated the counts and size of documents/collections in your restored data, this scenario is possible although atypical in the ratios described.

My MongoDB database has 20GB with MongoDB show dbs command

This shows you the size of files on disk, including preallocated space that exists from deletion of previous data. Preallocated space is available for reuse, but some MongoDB storage engines are more efficient than others.

MongoDB mongodumpbackup directory has only 3.8GB

The mongodump tool (as at v3.2.11, which you mention using) exports an uncompressed copy of your data unless you specify the --gzip option. This total should represent your actual data size but does not include storage used for indexes. The index definitions are exported by mongodump and the indexes will be rebuilt when the dump is reloaded via mongorestore.

With WiredTiger the uncompressed mongodump output is typically larger than the size of files on disk, which are compressed by default. For future backups I would consider using mongodump's built-in archiving and compression options to save yourself an extra step.

Since your mongodump output is significantly smaller than the storage size, your data files are either highly fragmented or there is some other data that you have not accounted for such as indexes or data in the local database. For example, if you have previously initialised this server as a replica set member the local database would contain a large preallocated replication oplog which will not be exported by mongodump.

You can potentially reclaim excessive unused space by running the compact command for a WiredTiger collection. However, there is an important caveat: running compact on a collection will block operations for the database being operated on so this should only be used during scheduled maintenance periods.

MongoDB gzipped-tarball has only 118MB and my gpg file has only 119MB in size.

Since mongodump output is uncompressed by default, compressing can make a significant difference depending on your data. However, 3.8GB to 119MB seems unreasonably good unless there is something special about your data (large number of small collections? repetitive data?). I would double check that your restored data matches the original in terms of collection counts, document counts, data size, and indexes.

Upvotes: 1

Related Questions