Have embedded documents instead of relying on foreign keys?

Question

I am new to MongoDB and NoSQL. I actually have different follow up questions depending on how this question gets answered. And I'll post my follow up questions as a separate question. Here we go...

I am trying to model a database to help me answer a question such as "Find all departments where 2 (TWO) or more teams each have 2 (TWO) or more employees known to have caused accidents greater than their team's max_accidents." If I were allowed to use a relational database with MySQL, I would solve the problem by making these tables:

department:department_id, location_id (FK to a location table not described here), unit_type

team: team_id, department_id, max_accidents

employee: employee_id, team_id, accidents

And then I'd use this query (untested, but hopefully you get the idea):

SELECT department_id FROM team
WHERE EXISTS (

    SELECT 1 FROM department
    WHERE department.department_id = team.team_id
    AND team.team_id IN (

        SELECT team_id FROM employee
        WHERE EXISTS (
            SELECT 1 FROM team
            WHERE team.team_id = employee.team_id
            AND employee.accidents > team.max_accidents
        ) GROUP BY team_id HAVING COUNT(*) >=2

    )
) GROUP BY department_id HAVING COUNT(*) >= 2

From what I understand about NoSQL databases, I can see two ways to model my collections. First, I could model each collection in exactly the same way I set out my tables above, meaning foreign keys would exist. The second possible way is this:

department = {_id,teams:[]team};

team = {_id,max_accidents,employees:[]employee};

employee = {_id,accidents};

My guess is that I should use the second approach where I embed arrays of documents. Then to perform my query, I would need to learn how to use the MongoDB aggregate framework as demonstrated in this question here:

Compare embedded document to parent field with mongoDB

I can build upon the aggregate approach to implement my HAVING COUNT(*) behaviour by using the $match feature as demonstrated in this question here:

What is the correct way to do a HAVING in a MongoDB GROUP BY?

I'd like confirmation if I'm approaching this problem correctly? If not, would be great if someone could explain why I might be approaching it the wrong way or what I might need to concern myself with.

Amit Phaltankar · Accepted Answer

From MongoDB Documentation

In general, use embedded data models when:

you have “contains” relationships between entities. See Model One-to-One Relationships with Embedded Documents.

you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents. See Model One-to-Many Relationships with Embedded Documents.

In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.

This is a fair enough guideline. However you can take you call depending on your case.

Ask questions:

Can an employee be part of multiple teams ?
Can a team be part of multiple departments ?

If answer is yes, would not think about embedded documents.

Consider a scenario where one employee is part of multiple teams. That means the employee object exists in multiple documents.

Which can cause: data duplication, need more storage, make updates redundant.

Have embedded documents instead of relying on foreign keys?

Answers (1)

Related Questions