How to handle large data sets in MongoDB

Question

I need help in deciding which schema type is more appropriate for my mongodb collection.

Let's say I want to store a list of things a person have. There will be relatively small number of people, but one person can have very many things. Let's assume people will be count in hundreds, but things a person own in hundreds of thousands.

I can think of two options:

Option 1:

    [{
        id: 1,
        name: "Tom",
        things: [
            {
                name: 'red tie',
                weight: 0.3,
                value: 5
            },
            {
                name: 'carpet',
                weight: 15,
                value: 700
            } //... and 300'000 other things 
        ]
    },
    {
        id: 2,
        name: "Rob",
        things: [
            {
                name: 'can of olives',
                weight: 0.4,
                value: 2
            },
            {
                name: 'Porsche',
                weight: 1500,
                value: 40000
            }// and 170'000 other things
        ]
    }//and 214 oher people]
]

Option 2:

[
    {
        name: 'red tie',
        weight: 0.3,
        value: 5,
        owner: {
            name: 'Tom',
            id: 1
        }
    },
    {
        name: 'carpet',
        weight: 15,
        value: 700,
        owner: {
            name: 'Tom',
            id: 1
        }
    },
    {
        name: 'can of olives',
        weight: 0.4,
        value: 2,
        owner: {
            name: 'Rob',
            id: 2
        }
    },
    {
        name: 'Porsche',
        weight: 1500,
        value: 40000,
        owner: {
            name: 'Rob',
            id: 2
        }
    }// and 20'000'000 other things
];

I will only ask for things from one owner in a single request and never ask for things from multiple owners.
I will need a pagination for the returned list of things so...
... things will need to be sorted by one of the parameters

From what I understand the first point suggest it would be much more efficient to use Option 1 (querying only few hundreds documents instead of millions), but points 2 and 3 are handled much more easily when using Option 2 (limit, skip and sort methods instead of $slice projection and Aggregation Framework).

Can anybody tell me which way would be more suitable? Or maybe I've got something wrong and there's even better solution?

Philipp · Accepted Answer

I will only ask for things from one owner in a single request and never ask for things from multiple owners.

I will need a pagination for the returned list of things so...

things will need to be sorted by one of the parameters

Your requirements 2 and 3 would be fulfilled much better by creating a collection where each item is an individual document. With an array, you would have to use the aggregation framework to $unwind that array, which can become quite slow. Your first requirement can easily be optimized for by creating an index on the owner.name or owner.id field of said collection, depending on which you use for querying.

Also, MongoDB does not handle growing documents very well. To discourage users from creating indefinitely growing documents, MongoDB has a 16MB per document limit. When each of your items is a few hundred byte, hundreds of thousands of array entries would exceed that limit.

How to handle large data sets in MongoDB

Answers (1)

Related Questions