thatmarvin
thatmarvin

Reputation: 2880

mongodb: Is this where I should just normalize my embedded objects?

I have a collection of Parents that contain EmbeddedThings, and each EmbeddedThing contains a reference to the User that created it.

UserCollection: [
  {
    _id: ObjectId(…),
    name: '…'
  },
  …
]

ParentCollection: [
  {
    _id: ObjectId(…),
      EmbeddedThings: [
      {
        _id: 1,
        userId: ObjectId(…)
      },
      {
        _id: 2,
        userId: ObjectId(…)
      }
    ]
  },
  …
]

I soon realized that I need to get all EmbeddedThings for a given user, which I managed to accomplish using map/reduce:

"results": [
  {
    "_id": 1,
    "value": [ `EmbeddedThing`, `EmbeddedThing`, … ]
  },
  {
    "_id": 2,
    "value": [ `EmbeddedThing`, `EmbeddedThing`, … ]
  },
  …
]

Is this where I should really just normalize EmbeddedThing into its own collection, or should I still keep map/reduce to accomplish this? Some other design perhaps?

If it helps, this is for users to see their list of EmbeddedThings across all Parents, as opposed to for some reporting/aggregation task (which made me realize I might me doing this wrong).

Thanks!

Upvotes: 2

Views: 1031

Answers (1)

Sergio Tulentsev
Sergio Tulentsev

Reputation: 230346

"To embed or not to embed: that is the question" :)

My rules are:

  • embed if an embedded object has sense only in the context of parent objects. For example, OrderItem without an Order doesn't make sense.
  • embed if dictated by performance requirements. It's very cheap to read full document tree (as opposed to having to make several queries and joining them programmatically).

You should look at your access patterns. If you load ParentThing several thousand times per second, and load User once a week, then map-reduce is probably a good choice. User query will be slow, but it might be ok for your application.

Yet another approach is to denormalize even more. That is, when you add an embedded thing, add it to both parent thing and user.

  • Pros: queries are fast.
  • Cons: Complicated code. Double amount of writes. Potential loss of sync (you update/delete in one place, but forget to do in another).

Upvotes: 2

Related Questions