Roman Mishchenko
Roman Mishchenko

Reputation: 38

MongoDB schema performance optimization

Hello I want to build mongoDB schema with the highest performance.

Generally my question is:

What is better: Collection with the huge sub documents array inside(about 10000) or 2 separated collections with the references(one of them may contain 50000000 records)?

detailed information

I have a mongoDB Model with the complex sub documents.

var usersSchema = new Schema({
email:{
    type: String,
    unique: true,
    required: true
},
packages : [{
    package : {type: Schema.Types.ObjectId, ref: 'Packages'},
    from : {type : Schema.Types.ObjectId, ref :'Languages'},
    to : {type : Schema.Types.ObjectId, ref :'Languages'},
    words : [{
        word: {type: String},
        progress: {type: Number,default : 0}
    }]
}]
});

Every user will probably have 3-10 packages with 1000 words. Application will probably have >10000 users. So probably I'll store about 50 000 000 words. But I'd love to have Pagination, normal Search and another juicy mongoDB features for collection words. But as I know it's pretty hard to use this functions with the sub documents.

My question is: What would be better for the system performance SubDocuments with the invalid pagination, search and update, but divided by users or one more independent model with 50 000 000 records ? something like this

var wordsSchema = new Schema({
      word: {type: String},
      progress: {type: Number,default : 0},
      user : {type : Schema.Types.ObjectId, ref :'Users'}
  }]
});

Upvotes: 0

Views: 1026

Answers (2)

Sammaye
Sammaye

Reputation: 43884

What is better: Collection with the huge sub documents array inside(about 10000) or 2 separated collections with the references(one of them may contain 50000000 records)?

The first thing that comes to mind here is: why is storing a reference costing you 5000 times what it costs to store in a subdocument?

Okay, looking at your schema I believe the best method is separate collection for words, not packages.

The first red flag I saw is your double nesting here:

packages : [{
    package : {type: Schema.Types.ObjectId, ref: 'Packages'},
    from : {type : Schema.Types.ObjectId, ref :'Languages'},
    to : {type : Schema.Types.ObjectId, ref :'Languages'},
    words : [{
        word: {type: String},
        progress: {type: Number,default : 0}
    }]
}]

The words subdocument will be very hard to work with in the current version of MongoDB, normally 2-3 levels deep starts to have problems, especially with positional operators.

Now considering that you should always work from the highest possible value you can get here:

Every user will probably have 3-10 packages with 1000 words.

You have also go to consider the cost of housing this document. The operators you need will be in-memory ones such as $pull, $push, $addToSet etc which means your entire document will need to be serialised and loaded into MongoDB's native C++ structs. This will be an extremely consuming task depending on the traffic to those documents.

Considering your comment:

I want to do a lot of read and write operations with the word collection, much less operations with the user collection.

it merely puts another nail in the coffin of embedding the words within the main user document. Considering what I said in the previous paragraph this will not work well with the cost of using in-memory operators on the words array.

But I'd love to have Pagination, normal Search and another juicy mongoDB features for collection words.

This will work much better if the words are split out, $slice is also an in-memory operator and probably would suffer diminished performance here.

And that's a quick reasoned response. I am sure there is more I could explain about my reason but that should be enough.

Upvotes: 2

Rohit Jain
Rohit Jain

Reputation: 2092

As per my opinion separated collections is better

Couple of things Keep in Mind

  1. Max size of single document is 16 MB
  2. Don't forget to create index (will improve query performance) http://docs.mongodb.org/manual/core/indexes-introduction/
  3. If you are using MongoDB 3.0 or greater you can use wiredtiger storage engine http://docs.mongodb.org/manual/release-notes/3.0-upgrade/

Hope it will help

Upvotes: 1

Related Questions