Reputation: 1277
I wanted to know if anyone knew if you can over use embedding on MongoDB. Not saying something like 100 levels deep, in my application my average document size can get pretty large, simple tests have shown documents of 177kb.
The application is for logging, so for example I take the Apache access log and get lots of things from it like a list of all the pages that were called, a lit of all the IP address and so on. And these are done by minute.
It is unlikely that that I would ever have a document that was at the MongoDB document size limit, but wanted to know if I keep each of the sub lists as there own document, would that make for better performance regarding, returning subset information (querying for all the IP addresses that took place over 5 minutes).
When I run the query I filter to only show the IP addresses, am I wasting the databases performance if I group each minute into one document, or am I wasting it if I split each list into its own document?
Upvotes: 1
Views: 1984
Reputation: 16380
You want to structure your collections and documents in a way that reflects how you intend to use the data. If you're going to do a lot of complex queries, especially with subdocuments, you might find it easier to split your documents up into separate collections. An example of this would be splitting comments from blog posts.
Your comments could be stored as an array of subdocuments:
# Example post document with comment subdocuments
{
title: 'How to Mongo!'
content: 'So I want to talk about MongoDB.',
comments: [
{
author: 'Renold',
content: 'This post, it's amazing.'
},
...
]
}
This might cause problems, though, if you want to do complex queries on just comments (e.g. picking the most recent comments from all posts or getting all comments by one author.) If you plan on making these complex queries, you'd be better off creating two collections: one for comments and the other for posts.
# Example post document with "ForeignKeys" to comment documents
{
_id: ObjectId("50c21579c5f2c80000000000"),
title: 'How to Mongo!',
content: 'So I want to talk about MongoDB.',
comments: [
ObjectId("50c21579c5f2c80000000001"),
ObjectId("50c21579c5f2c80000000002"),
...
]
}
# Example comment document with a "ForeignKey" to a post document
{
_id: ObjectId("50c21579c5f2c80000000001"),
post_id: ObjectId("50c21579c5f2c80000000000"),
title: 'Renold',
content: 'This post, it's amazing.'
}
This is similar to how you'd store "ForeignKeys" in a relational database. Normalizing your documents like this makes for querying both comments and posts easy. Also, since you're breaking up your documents, each document will take up less memory. The trade-off, though, is you have to maintain the ObjectId
references whenever there's a change to either document (e.g. when you insert/update/delete a comment or post.) And since there are no event hooks in Mongo, you have to do all this maintenance in your application.
On the other-hand, if you don't plan on doing any complex queries on a document's subdocuments, you might benefit from storing monolithic objects. For instance, a user's preferences isn't something you're likely to make queries for:
# Example user document with address subdocument
{
ObjectId("50c21579c5f2c800000000421"),
name: 'Howard',
password: 'naughtysecret',
address: {
state: 'FL',
city: 'Gainesville',
zip: 32608
}
}
Upvotes: 2