Reputation: 824
I'm using mongodb for storing analytics for multiple websites. The sites have millions of visits a day to thousands of different urls per day. I to count the number of visits each URL has.
Right now I'll need each day to get the data of the previous day.
Is it better to store each URL in it's own document or all URLs under one object in one document?
Upvotes: 3
Views: 1138
Reputation: 116
Multiple documents or less documents with large objects
Inevitably, everyone who uses MongoDB has to choose between using multiple collections with id references or embedded documents. Both solutions have their strengths and weaknesses. Learn to use both :
Use separate collections
db.posts.find();
{_id: 1, title: 'unicorns are awesome', ...}
db.comments.find();
{_id: 1, post_id: 1, title: 'i agree', ...}
{_id: 2, post_id: 1, title: 'they kill vampires too!', ...}
Use embedded documents
db.posts.find();
{_id: 1, title: 'unicorns are awesome', ..., comments: [
{title: 'i agree', ...},
{title: 'they kill vampires too!', ...}
]}
Separate collections offer the greatest querying flexibility
// sort comments however you want
db.comments.find({post_id: 3}).sort({votes: -1}).limit(5)
// pull out one or more specific comment(s)
db.comments.find({post_id: 3, user: 'leto'})
// get all of a user's comments joining the posts to get the title
var comments = db.comments.find({user: 'leto'}, {post_id: true})
var postIds = comments.map(function(c) { return c.post_id; });
db.posts.find({_id: {$in: postIds}}, {title: true});
Selecting embedded documents is more limited
// you can select a range (useful for paging)
// but can't sort, so you are limited to the insertion order
db.posts.find({_id: 3}, {comments: {$slice: [0, 5]}})
// you can select the post without any comments also
db.posts.find({_id: 54}, {comments: -1})
// you can't use the update's position operator ($) for field selections
db.posts.find({'comments.user': 'leto'}, {title: 1, 'comments.$': 1})
A document, including all its embedded documents and arrays, cannot exceed 16MB.
Separate collections require more work
// finding a post + its comments is two queries and requires extra work
// in your code to make it all pretty (or your ODM might do it for you)
db.posts.find({_id: 9001});
db.comments.find({post_id: 9001})
Embedded documents are easy and fast (single seek)
// finding a post + its comments
db.posts.find({_id: 9001});
No big differences for inserts and updates
// separate collection insert and update
db.comments.insert({post_id: 43, title: 'i hate unicrons', user: 'dracula'});
db.comments.update({_id: 4949}, {$set : {title: 'i hate unicorns'}});
// embedded document insert and update
db.posts.update({_id: 43}, {$push: {title: 'lol @ emo vampire', user: 'paul'}})
// this specific update requires that we store an _id with each comment
db.posts.update( {'comments._id': 4949}, {$inc:{'comments.$.votes':1}})
So, separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. Embedded documents are good when you want the entire document, the document with a $slice of comments, or with no comments at all. As a general rule, if you have a lot of "comments" or if they are large, a separate collection might be best. Smaller and/or fewer documents tend to be a natural fit for embedding.
Remember, you can always change your mind. Trying both is the best way to learn.
Upvotes: 5