Reputation: 185
I am creating an application with nodejs and mongod(Not mongoose). I have a problem that gave me headache over few days, anyone please suggest a way for this!!. I have a mongodb design like this
post{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}
user{
_id:ObjectId('123456'),
name: 'some name', --> changable at any times
username: 'some_name', --> changable at any times
picture: 'url_link' --> changable at any times
}
I want to query the post along with all the user information so the query will look like this:
[{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
user_data:{
_id:ObjectId('123456'),
name: 'some name',
username: 'some_name',
picture: 'url_link'
}
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}]
I tried to use loop to manually get the user data and add to comment but it proves to be difficult and not achievable by my coding skill :(
Please anybody got any suggestion, I would be really appreciated.
P/s I am trying another approach that I would embedded all the user data in to the comment and whenever the user update their username, name or picture. They will update it in all the comment as well
Upvotes: 4
Views: 4289
Reputation: 20693
As written before, there are several problems when over-embedding:
As of the time of this writing, BSON documents are limited to 16MB. If that limit is reached, MongoDB would throw an exception and you simply could not add more comments and in worst case scenarios not even change the (user-)name or the picture if the change would increase the size of the document.
It is not easily possible to query or sort the comments array under certain conditions. Some things would require a rather costly aggregation, others rather complicated statements.
While one could argue that once the queries are in place, this isn't much of a problem, I beg to differ. First, the more complicated a query is, the harder it is to optimize, both for the developer and subsequently MongoDBs query optimizer. I have had the best results with simplyfying data models and queries, speeding up responses by a factor of 100 in one instance.
When scaling, the ressources needed for complicated and/or costly queries might even sum up to whole machines when compared to a simpler data model and according queries.
Last but not least you might well run into problems maintaining your code. As a simple rule of thumb
The more complicated your code becomes, the harder it is to maintain. The harder code is to maintain, the more time it needs to maintain the code. The more time it needs to maintain code, the more expensive it gets.
Conclusion: Complicated code is expensive.
In this context, "expensive" both refers to money (for professional projects) and time (for hobby projects).
It is pretty easy: simplify your data model. Consequently, your queries will become less complicated and (hopefully) faster.
That's going to be a wild guess for me, but the important thing here is to show you the general method. I'd define your use cases as follows:
First of all, we have a straightforward user model
{
_id: new ObjectId(),
name: "Joe Average",
username: "HotGrrrl96",
picture: "some_link"
}
Nothing new here, added just for completeness.
{
_id: new ObjectId()
title: "A post",
content: " Interesting stuff",
picture: "some_link",
created: new ISODate(),
author: {
username: "HotGrrrl96",
picture: "some_link"
}
}
And that's about it for a post. There are two things to note here: first, we store the author data we immediately need when displaying a post, since this saves us a query for a very common, if not ubiquitous use case. Why don't we save the comments and commenters data acordingly? Because of the 16 MB size limit, we are trying to prevent the storage of references in a single document. Rather, we store the references in comment documents:
{
_id: new ObjectId(),
post: someObjectId,
created: new ISODate(),
commenter: {
username: "FooBar",
picture: "some_link"
},
comment: "Awesome!"
}
The same as with posts, we have all the necessary data for displaying a post.
What we have achieved now is that we circumvented the BSON size limit and we don't need to refer to the user data in order to be able to display posts and comments, which should save us a lot of queries. But let's come back to the use cases and some more queries
That's totally straightforward now.
For all comments
db.comments.find({post:objectIdOfPost})
For the 3 lastest comments
db.comments.find({post:objectIdOfPost}).sort({created:-1}).limit(3)
So for displaying a post and all (or some) of its comments including the usernames and pictures we are at two queries. More than you needed before, but we circumvented the size limit and basically you can have an indefinite number of comments for every post. But let's get to something real
This is a two step process. However, with proper indexing (will come back to that later) this still should be fast (and hence resource saving):
var posts = db.posts.find().sort({created:-1}).limit(5)
posts.forEach(
function(post) {
doSomethingWith(post);
var comments = db.comments.find({"post":post._id}).sort("created":-1).limit(3);
doSomethingElseWith(comments);
}
)
var posts = db.posts.find({"author.username": "HotGrrrl96"},{_id:1}).sort({"created":-1});
var postIds = [];
posts.forEach(
function(post){
postIds.push(post._id);
}
)
var comments = db.comments.find({post: {$in: postIds}}).sort({post:1, created:-1});
Note that we have only two queries here. Although you need to "manually" make the connection between posts and their respective comments, that should be pretty straightforward.
This presumably is a rare use case executed. However, it isn't very complicated with said data model
First, we change the user document
db.users.update(
{ username: "HotGrrrl96"},
{
$set: { username: "Joe Cool"},
$push: {oldUsernames: "HotGrrrl96" }
},
{
writeConcern: {w: "majority"}
}
);
We push the old username to an according array. This is a security measure in case something goes wrong with the following operations. Furthermore, we set the write concern to a rather high level in order to make sure the data is durable.
db.posts.update(
{ "author.username": "HotGrrrl96"},
{ $set:{ "author.username": "Joe Cool"} },
{
multi:true,
writeConcern: {w:"majority"}
}
)
Nothing special here. The update statement for the comments looks pretty much the same. While those queries take some time, they are rarely executed.
As a rule of thumb, one can say that MongoDB can only use one index per query. While this is not entirely true since there are index intersections, it is easy to deal with. Another thing is that individual fields in a compound index can be used independently. So an easy approach to index optimization is to find the query with the most fields used in operations which make use of indices and create a compound index of them. Note that the order of occurrence in the query matters. So, let's go ahead.
db.posts.createIndex({"author.username":1,"created":-1})
db.comments.createIndex({"post":1, "created":-1})
A fully embedded document per post admittedly is the the fastest way of loading it and it's comments. However, it does not scale well and due to the nature of possibly complex queries necessary to deal with it, this performance advantage may be leveraged or even eliminated.
With the above solution, you trade some speed (if!) against basically unlimited scalability and a much more straightforward way of dealing with the data.
Hth.
Upvotes: 12
Reputation: 933
You are following Normalized data model approach. if you are following this model means, you have to write another query to get the user info or If you uses the embedded document store then all the user doc must change whenever updates on user doc. http://docs.mongodb.org/v3.0/reference/database-references/ read this link for more information.
Upvotes: 0