Ekin Koc
Ekin Koc

Reputation: 2997

MongoDB document schema

I've been working on a web project with MongoDB database layer. I have a particular entity that I can not map to document db properly, thought it would be better to get some feedback.

Say, I have User and Item collections. Users can like or dislike items. There are also tags in items and users also can like or dislike tags. I need to be able to look up for like / dislike counts fast enough.

What I came up with is something like this (for item):

{
    name: "Item Name",
    statistics : {
        likes:      5,
        dislikes:   6
    },
    tags: [
        { name: "Foo", likes: 10, dislikes: 20 },
        { name: "Bar", likes: 5,  dislikes: 1  }
    ]
}

This is pretty decent. But the problem is, I need to know if a user liked / disliked a tag or item. Now, what I came up with is something like this:

{
    name: "Item Name",
    statistics : {
        likes:      5,
        dislikes:   6
    },
    tags: [
        { 
            name: "Foo", 
            likes: 2, 
            dislikes: 1,
            votes: [
                { user: "user1_id", vote: 1 }, //like 
                { user: "user2_id", vote: 1 }, //like 
                { user: "user3_id", vote: -1 }, //dislike 
            ]
        },
        { 
            name: "Bar", 
            likes: 0,  
            dislikes: 0,
            votes: []
        }
    ]
}

This looks promising, and the biggest benefit I see here is that I can do atomic updates if someone changes his mind and dislikes something that he liked before.

But, I expect around 10 tags in each item, with, maybe 100 votes each. Then I have around 1000 nested vote objects for each item. I know that mongodb can handle 16mb documents but still, is it ok to store this much data in one document?

Should I go for a normalized model. Maybe with a "tagvotes" collection and an itemvotes collection? It feels more natural to me actually.

Just wandering if I'm thinking relational or rational?

Thanks.

Upvotes: 2

Views: 486

Answers (2)

mnemosyn
mnemosyn

Reputation: 46291

is it ok to store this much data in one document?

I don't see problems with the amount of data you store per object, but your read/update patterns are worrying: every time you fetch the item, you'll also fetch all the votes, each user's id, etc. Also, when adding votes, you will grow the object. Sometimes, MongoDB will have to reallocate your object, which takes a bit of time. Over time, it will learn that you are frequently growing objects, and the padding factor will increase, but frequently growing objects is not the best idea.

I can do atomic updates if someone changes his mind and dislikes something that he liked before.

This is a bit tricky. You can use $pull and $push, but off the top of my head I don't know how you can also keep the likes and dislikes counts in sync. Moreover, what happens if a user really changed his mind? You'd have to do both $push and $pull, and that is not possible if I remember correctly.

Just wondering if I'm thinking relational or rational?

Both. This is a relational problem :-)

Now I wanted to conclude that you should denormalize the counts and store the relations in a different collection, but Hightechrider already wrote that. Too slow. ;-)

Upvotes: 1

Ian Mercer
Ian Mercer

Reputation: 39277

At some point trying to embed everything becomes impossible in any M x N type of situation as M and N grow. Well before you reach that point you need to create a separate collection and do client-side joins; but that doesn't mean you have to normalize everything totally.

In this case, think about what views you want to show the user: clearly you will want to show the item, how many likes and dislikes it has and the set of tags that have been applied to it and maybe how popular each of those tags are. But the actual list of users who liked/disliked the object and liked/disliked each tag can go into a separate document (in a separate collection).

With a schema like that you can do one query to get the item and everything you need to display alongside that item. And then, if you need it, just one more query to get the current user's opinions about that item and all of the tags they have voted on that are relevant to that item.

Upvotes: 5

Related Questions