Why does 24 MB of CSV data become 230 MB in MongoDB collection?

Question

My Meteor app takes a CSV file, parses it with Baby Parse (Papa Parse for server) and inserts the data to a MongoDB collection.

Each CSV row is inserted as a document. 24 MB CSV file contains ~900,000 rows; hence, ~900,000 documents in the collection. Each document has 5 fields including the unique id of documents.

When I use dataSize() to get collection size, I receive the number 230172976; if I'm not mistaken, this number is in bytes; therefore it is 230 MB.

Why is this gigantic increase happening? How can I fix this?

Sede · Accepted Answer

This is because the value returns by .dataSize() include the records padding. Also note that if your documents don't have the _id field it will be added and each _id field is 12-byte. You may want to read Record Allocation Strategies

How can I fix this:

Using the collMod command with the noPadding flag or the db.createCollection() method with the noPadding option. But you shouldn't do that because as mentioned in the documentation:

Only set noPadding to true for collections whose workloads have no update operations that cause documents to grow, such as for collections with workloads that are insert-only.

As Pete Garafano mentioned in the comment below, this is applicable for the MMAPv1 Storage Engine only; which is the default storage engine in MongoDB 3.0 and all previous versions.

MongoDB 3.2 use the WiredTiger Storage Engine and you will need to change the default storage engine in order to use that option in your configuration file or using the --storageEngine option.

Why does 24 MB of CSV data become 230 MB in MongoDB collection?

Answers (1)

Related Questions