Mongo DB Schema Design

Question

I'm struggling to find the best database design for an app. I have a SQL background and tend to create a more or less denormalised database design.

I have the following problem. I have a collection of "Articles" containing about 2000 articles. Each article has quite a lot of information. Implementing a recommender system, I want to associate with every "User" a "PredictedRating" for each "Article". In SQL I would model this using three tables: "Articles", "Users", "UserToArticle". The query should be as follows: I want to associate for every "Article" the "PredictedRating" for the current user logged in. In SQL I would make a join over "Article" and "Users" preselecting the corresponding user. Having the correct indexes this is very fast.

How could I implement this in the mongo way? When I implement this in the described way, I'm forced to put a findOne() query for every article, which is very inefficient and slow (even when using an index).

Have you any ideas? The important thing is, that only predicted ratings for the current user are published.

Christian Strempfer · Accepted Answer

Rules Of Thumb

The MongoDB Blog has some good advice on data modeling:

Use embedded documents whenever possible.
If a subdocument is often read on it's own, it could be better to not embed it.
Keep arrays small. If an embedded array of document keeps growing, replace it by an array of reference ids. If an array of references keeps growing, try to invert the references or extract the references into it's own collection.
Application-level joins are still an option. When using indexes and projection correctly, there shouldn't be a performance drop.
You could embed documents, which are rarely updated but often read, even if that means redundant data. Don't embed redundant data, if you need to update it frequently, because it could outweigh the read advantage.
Optimize your data model for your application. What needs to be read or written together should be moved closer (into fewer collections).

Therefore modeling a document database isn't as straight forward as normalizing a relational data model. When you mastered these rules of thumb, you should read about data models in the MongoDB manual.

Example

We need to put three domain objects into MongoDB: user, article, and predicted rating. I assume that there are a lot of users and even more articles. It's pretty clear that we shouldn't put users and articles into one collection (bullets 2, 4, and 5). Therefore we only need to decide where to put the predicted ratings.

Embedding ratings into articles

As your use case is to get all predicted ratings for an user, it would be counterproductive to put them into articles (6). You would need to search through all articles to get the ratings. Besides that if you remove an user you need to update every article.

Embedding ratings into users

Embedding ratings into users has the advantage that you only need one query to get user and rating data. But you'll probably want to add a rating for every article to each user, therefore the arrays will grow to much (3).

Putting ratings into it's own collection

Therefore it's viable to put ratings into their own collection.

{
    _id: ObjectId("f01..."),
    userId: ObjectId("123..."),
    articleId: ObjectId("abc..."),
    predictedRating: 5.4
}

As said it depends on your quantity structure. If you only have few users or few articles, embedding the predicted ratings could be a simpler and faster solution.

Mongo DB Schema Design

Answers (1)

Related Questions