Reputation: 1443

Mongodb data model strategy guidance/advice

I'm just starting out with node and mongodb and am trying to understand how best to structure the data (having come from a lifetime of sql).

So I've ended up with a data structure that's mostly embedded, I believe the relationships to be logical but would like some outside feedback before I go too far down the rabbit hole!

Here's my proposed new mongodb data model:

user
    name (string)
    email (string)
    avatar (string)
    password (string)
    newsletter (binary)

account
    admins
        user (objectId)
    name (string)
    logo (string)
    sub (number)
    stripe (string)
    property
        users
            user (objectId)
            party (number)
            role (number)
            admin (binary)
        name (string)
        ecd (date)
        complete (binary)
        activity
            description (string)
            user (objectId)
            time (date)
        task_group
            position (number)
            name (string)
            task
                assinged
                    user (objectId)
                    complete (binary)
                name (string)
                description (string)
                due (date)
                visibility (number)
                comment
                    user (objectId)
                    time (date)
                    comment (string)

Previously (I'm rebuilding an existing sql app) there were a lot of tables purely to bridge the data, i.e. account_link to connect users with accounts (many to many) etc. These have now been embedded which allows for a slightly more intuitive structure. Given that the embedded data only needs to be accessed in the context of its parent I think this is the way to go.

My concern is that certain sub docs will grow quite large. Do I have to worry at all about how much data is contained in a sub doc? Or should I treat sub docs exactly as I would tables? i.e. if it transpires that each task_group contains 400,000 tasks, will that unnecessary 'bloat' a property? Is there a point where you split this content out and create 'linking tables' purely for practical/performance reasons? Or am I just so stuck in sql mindset that this just feels wrong?

Update

Given the advice received and referenced I believe I've produced a more appropriate design, although as has been noted elsewhere, it's more of an art than a science. Feedback still welcome!

Important considerations:

I won't re-write the linked blog post, but to summarise:

Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions

I've also accounted for growth/document size consistency as referenced in one of the answers.

USER
    name (string)
    email (string)
    avatar (string)
    password (string)
    newsletter (binary)

ACCOUNT
    admins (USER reference array)
    name (string)
    logo (string)
    sub (number)
    stripe (string)
    properties (PROPERTY reference array)

PROPERTY
    name (string)
    ecd (date)
    complete (binary)
    users
        user (USER objectId)
        party (number)
        role (number)
        admin (binary)
    activity
        description (string)
        time (date)
    task_groups (TASK_GROUP reference array)

TASK_GROUPS
    property (PROPERTY objectId)
    position (number)
    name (string)
    task
        assigned
            user (USER objectId)
            complete (binary)
        name (string)
        description (string)
        due (date)
        visibility (number)
        comment
            user (USER objectId)
            time (date)
            comment (string)

Upvotes: 1

Answers (3)

Chacliff

Reputation: 71

I would even go so far as to seperate the task from the task group and make the group that it belongs to a property of the task. You may want to query for every task in the group. which you can do as long as you know which task it belongs to.

But you may also want to find a particular task, but the information about the group or groups it belongs to might still be relavent to that task. If you embed a task in a task group in that fashion you limit your application to having to look figure out what category/group the task might belong in. Maybe the groups function more like a filter, find a task with this description amongst these groups.

the different queries you might want to do on these structures becomes more obvious when you think about how you want to query. The next step being from query and building your model being indexing. if you have an index on an embedded document it should probably be a seperate model related to the original. but this also depends on how much the embedded document relies on the properties in the structure above it.

tl;dr;

I have found a common rule of thumb is, if you do a lot of reads and very very few writes with the embedded documents it is ok to embed. with heavy writes and reads you will want to seperate the embedded nature.

Upvotes: 0

kaxi1993

Reputation: 4710

enter image description here

look this pictures before i will explain them:

every document in collection have its own place and space when documents grows and there are no enough space is goes at the and of the collection and free space is left behind for example you have post collection and it has embedded collection comments

post {
  _id:ObjectId('101');
  comments:[{author:'john',text:'some text'},{author:'mike',text:'some text'}]
}

this model is useful when you can add only one-two or three comments not a lot but when you can push comments as many as you need you must write document with references there will be post collection and comment collection

post collection document:

{
_id:ObjectId('101')
}

comments collection document:

  {
    _id:ObjectId('10001'),
    _postId:ObjectId('101'),//references to post collection document!
    text:'some text',
    author:'john'
    }

Upvotes: 1

kaxi1993

Reputation: 4710

http://docs.mongodb.org/manual/tutorial/model-embedded-one-to-one-relationships-between-documents/ here are documentation about

1)Model One-to-One Relationships with 2)Model One-to-Many Relationships with Embedded Document 3)Model One-to-Many Relationships with Document References

read all three paragraphs

i will say only one when embedded docs grows quite large you have to use Document References not embedded

Upvotes: 0

Mongodb data model strategy guidance/advice

Answers (3)

Related Questions