BenMann
BenMann

Reputation: 248

Mongo DB relations between documents in different collections

I'm not yet ready to let this go, which is why I re-thought the problem and edited the Q (original below).


I am using mongoDB for a weekend project and it requires some relations in the DB, which is what the misery is all about:

I have three collections:

Users
Lists
Texts

A user can have texts and lists - lists 'contain' texts. Texts can be in multiple lists.

I decided to go with separate collections (not embeds) because child documents don't always appear in context of their parent (eg. all texts, without being in a list).

So what needs to be done is reference the texts that belong into certain lists with exactly those lists. There can be unlimited lists and texts, though lists will be less in comparison.

In contrast to what I first thought of, I could also put the reference in every single text-document and not all text-ids in the list-documents. It would actually make a difference, because I could get away with one query to find every snippet in a list. Could even index that reference.

var TextSchema = new Schema({
      _id: Number,
      name: String,
      inListID: { type : Array , "default" : [] },
      [...]

It is also rather seldom the case that texts will be in MANY lists, so the array would not really explode. The question kind of remains though, is there a chance this scales or actually a better way of implementing it with mongoDB? Would it help to limit the amount of lists a text can be in (probably)? Is there a recipe for few:many relations?

It would even be awesome to get references to projects where this has been done and how it was implemented (few:many relations). I can't believe everybody shies away from mongo DB as soon as some relations are needed.



Original Question

I'll break it down in two problems I see so far: 1) Let's assume a list consists of 5 texts. How do I reference the texts contained in a list? Just open an array and store the text's _ids in there? Seems like those arrays might grow to the moon and back, slowing the app down? On the other hand texts need to be available without a list, so embedding is not really an option. What if I want to get all texts of a list that contains 100 texts.. sounds like two queries and an array with 100 fields :-/. So is this way of referencing the proper way to do it?

var ListSchema = new Schema({
  _id: Number,
  name: String,
  textids: { type : Array , "default" : [] },
  [...]

Problem 2) I see with this approach is cleaning the references if a text is deleted. Its reference will still be in every list that contained the text and I wouldn't want to iterate through all the lists to clean out those dead references. Or would I? Is there a smart way to solve this? Just making the texts hold the reference (in which list they are) just moves the problem around, so that's not an option.

I guess I'm not the first with this sort of problem but I was also unable to find a definitive answer on how to do it 'right'.

I'm also interested in general thoughts on best-practice for this sort of referencing (many-to-many?) and especially scalability/performance.

Upvotes: 10

Views: 5269

Answers (5)

cn0047
cn0047

Reputation: 17061

I am not sure that is this question still actual, but i have similar experience.
First of all i want to say what tells official mongo documentation:

Use embedded data models when: you have one-to-one or one-to-many model.
For model many-to-many use relationships with document references.

I think is the answer) but this answer provide a lot of problems because:

  1. As were mentioned, mongo don't provide transactions at all.
  2. And you don't have foreign key constraints.
  3. Even if you have references (DBRefs) between documents, you will be faced with amazing problem how to dereference this documents.

Each this item - is huge piece of responsibility, even if you work at weekend project. And it might mean that you should be write many code to provide simple behaviour of your system (for example you can see how realize transaction in mongo here).

I have no idea how done foreign key constraints, and i don't saw something in this direction in mongo documentation, that's why i think that it amazing challenge (and risk for project).

And the last, mongo references - it isn't mysql join, and you dont receive all data from parent collection with data from child collection (like all fields from table and all fields from joined table in mysql), you will receive just REFERENCE to another document in another collection, and you will need to do something with this reference (dereference). It can be easily reached in node by callback, but only in case when you need just one text from one list, but if you need all texts in one list - it's terrible, but if you need all texts in more than one list - it's become nightmare...

Perhaps it's my not the best experience... but i think you should think about it...

Upvotes: 1

BenMann
BenMann

Reputation: 248

Writing an answer as I want to explain how I will proceed from here.

Taking into consideration the answers here and my own research on the topic, it might actually be fine storing those references (not really relations) in an array, trying to keep it relativley small: less than 1000 fields is very likely in my case.

Especially because I can get away with one query (which I first though I couldn't) that doen't even require using $in so far, I'm confident that the approach will scale. After all it's 'just a weekend-project', so if it doesn't and I end up re-writing - that's fine.

With a text-schema like this:

var textSchema = new Schema({
  _id: {type: Number, required: true, index: { unique: true }},
  ...
  inList: { type : [Number] , "default" : [], index: true }
});

I can simply get all texts in a list with this query, where inList is an indexed array containing the _ids of the texts in the list.

Text.find({inList: listID}, function(err, text) {
  ...      
});

I will still have to deal with foreign key constraints and write my own "clean-up" functions that take care of removing references if a list is removed - remove reference in every text that was in the list. Luckily this will happen very rarely, so I'm okay with going through every text once in a while.

On the other hand I don't have to care about deleting references in a list-document if a text is removed, because I only store the reference on one side of the relation (in the text-document). Quite an important point in my opinion!

@mnemosyn: thanks for the link and pointing out that this is indeed not a large join or in other words: just a very simple relation. Also some numbers on how long those complex operations take (ofc. hardware dependet) is a big help.
PS: Grüße aus Bielefeld.

What I found most helpful during my own research was this vid, where Alvin Richards also talks about many-to-many relations at around min. 17. This is where I got the idea of making the relation one-sided to save myself some work cleaning up the dead references.

Thanks for the help guys 👍

Upvotes: 0

mnemosyn
mnemosyn

Reputation: 46291

Relations are usually not a big problem, though certain operations involving relations might be. That depends largely on the problem you're trying to solve, and very strongly on the cardinality of the result set and the selectivity of the keys.

I have written a simple testbed that generates data following a typical long-tail distribution to play with. It turns out that MongoDB is usually better at relations than people believe.

After all, there are only three differences to relational databases:

  • Foreign key constraints: You have to manage these yourself, so there's some risk for dead links
  • Transaction isolation: Since there are no multi-document transactions, there's some likelihood for creating invalid foreign key constraints even if the code is correct (in the sense that it never tries to create a dead link), but merely interrupted at runtime. Also, it is hard to check for dead links because you could be observing a race condition
  • Joins: MongoDB doesn't support joins, though a manual subquery with $in does scale well up to several thousand items in the $in-clause, provided the reference values are indexed, of course

Iff you need to perform large joins, i.e. if your queries are truly relational and you need large amount of the data joined accordingly, MongoDB is probably not a good fit. However, many joins required in relational databases aren't truly relational, they are required because you had to split up your object to multiple tables, for instance because it contains a list.

An example of a 'truly' relational query could be "Find me all customers who bought products that got >4 star reviews by customers that ranked high in turnover in June". Unless you have a very specialized schema that essentially was built to support this query, you'll most likely need to find all the orders, group them by customer ids, take the top n results, use these to query ratings using $in and use another $in to find the actual customers. Still, if you can limit yourself to the top, say 10k customers of June, this is three round-trips and some fast $in queries.

That will probably be in the range of 10-30ms on typical cloud hardware as long as your queries are supported by indexes in RAM and the network isn't completely congested. In this example, things get messy if the data is too sparse, i.e. the top 10k users hardly wrote >4 star reviews, which would force you to write program logic that is smart enough to keep iterating the first step which is both complicated and slow, but if that is such an important scenario, there is probably a better suited data structure anyway.

Upvotes: 6

MegaMind
MegaMind

Reputation: 671

Using array in MongoDB is generally not preferable, and generally not advised by experts.

Here is a solution that came to my mind :

Each document of Users is always unique. There can be Lists and Texts for individual document in Users. So therefore, Lists and Texts have a Field for USER ID, which will be the _id of Users.

Lists always have an owner in Users so they are stored as they are.

Owner of Texts can be either Users or List, so you should keep a Field of LIST ID also in it, which will be _id of Lists.

Now mind that Texts cannot have both USER ID and LIST ID, so you will have to keep a condition that there should be only ONE out of both, the other should be null so that we can easily know who is the primary owner of the Texts.

Upvotes: 0

vmr
vmr

Reputation: 1935

Using MongoDB with references is a gateway to performance issues. Perfect example of what not to use. This is a m:n kind of relation where m and n can scale to millions. MongoDB works well where we have 1:n(few), 1:n(many), m(few):n(many). But not in situations where you have m(many):n(many). It will obviously result in 2 queries and lot of housekeeping.

Upvotes: 4

Related Questions