Dmitry Samoylov
Dmitry Samoylov

Reputation: 1318

Can I reduce mongo database size by normalizing and storing only values in each document?

I'm totally new to Mongo, and decided to give it a try with my dateset, which I store now in MySQL. I have started with a single table, which turns out to be document collection now.

Each document has about 30 properties and each property is generally a short string. So in MySQL I have varchar fields, and here in Mongo it's just BSON.

I was really surprised when I found out that the database size I got in Mongo almost 7 times bigger than mysql innodb. With 1,500,000 records/documents it is about 350 Mb in MySQL and 2.3 Gb in Monbo since it stores property name for each single piece of data.

Is there a way to come up with 'schema' and normalize data to reduce the database size?

UPD: I use MongoDB 3.4, and since db.serverstatus().storageEngine.name is mmapv1, I suppose I use MMap.

Sample document:

{
  "_id" : ObjectId("595c4f4342ce99299c19c379"),
  "someNumber" : "11420",
  "companyDataFromSource1" : {
    "lolNumber" : "11420",
    "businessName" : "Feather mighty shop Inc",
    "businessOwner" : "",
    "businessHealthCode" : "Q",
    "realUpdateDate" : "13-OCt-16",
    "someNumericInfo" : "10000",
    "yearWhenItHappened" : "2014",
    "moreDate" : "22-Jan-02",
    "anotherStatus" : "DE",
    "numValue1" : "1",
    "numValue2" : "1",
    "someProp" : "E",
    "anotherProp" : "R",
    "street" : "BB251 Database St",
    "city" : "Denver",
    "state" : "CO",
    "zip" : "12345",
    "country" : "US",
    "anotherStreet" : "TT251 Server St",
    "anotherCity" : "Sratford",
    "anotherState" : "WI",
    "anotherZip" : "54484",
    "anotherCountry" : "US",
    "telephone" : "(123) 481-2222",
    "fax" : "",
    "emailAddress" : "[email protected]",
  }
}

db.collecion('collection').stats results:

{
  "ns" : "db.collection",
  "size" : 1507059792.0,
  "count" : 1495099,
  "avgObjSize" : 1008,
  "numExtents" : 19,
  "storageSize" : 1580150784.0,
  "lastExtentSize" : 415174656.0,
  "paddingFactor" : 1.0,
  "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
  "userFlags" : 1,
  "capped" : false,
  "nindexes" : 1,
  "totalIndexSize" : 60518752,
  "indexSizes" : {
      "_id_" : 60518752
  },
"ok" : 1.0
}

Upvotes: 2

Views: 1118

Answers (2)

felix
felix

Reputation: 9285

A few hints on how to reduce your database size:

Database engine

First, switch from MMApv1 to WiredTiger storage engine, as it's way more efficient to compress data. The default compression level is snappy, but you can compress even more using zlib with few performance trade off. See Mongodb 3.0 for details.

To move to WiredTiger, see this tutorial from MongoDB documentation: https://docs.mongodb.com/manual/tutorial/change-standalone-wiredtiger/

Reduce key size

Second thing is to reduce your documents size (currently ~1008 bytes according to the avgObjSize field ). To do this, use smaller keys (ideally, two letter long keys). for example,

companyDataFromSource1 ==> c1

The benefits from this will be small with compression enabled, but this is always a good practice as it reduce the size of data that will be send to your app / over network

Upvotes: 2

Sergio Tulentsev
Sergio Tulentsev

Reputation: 230296

Is there a way to come up with 'schema' and reduce the database size?

No. Mongodb is schemaless, that is one of its core features. You could use shorter names for your properties. Or use a compressing storage engine (WiredTiger).

Upvotes: 1

Related Questions