Alexander Mills
Alexander Mills

Reputation: 99970

What is the most efficient way to store a tree data structure in MongoDB?

I was thinking the simplest thing would be a flat list:

{
  id: ObjectId()
  parentId:  ObjectId()
  value: ‘foo’,
}

Just one big collection. To find the child nodes of a node, just search through the list and find all instances where the parentId equals the current node id. Indexes on id/parentId.

This might be faster for writes, but reads might become pretty horrific. And we will have a lot more reads than writes!

MongoDB has some sort of built in tree data structure: https://docs.mongodb.com/manual/applications/data-models-tree-structures/

But I am wondering how that differs from a flat list like the one I proposed.

Upvotes: 3

Views: 7199

Answers (2)

vaheeds
vaheeds

Reputation: 2903

It is tricky in real-world apps. MongoDB recommends five ways:

Model Tree Structures with Parent References: Presents a data model that organizes documents in a tree-like structure by storing references to "parent" nodes in "child" nodes.

Model Tree Structures with Child References: Presents a data model that organizes documents in a tree-like structure by storing references to "child" nodes in "parent" nodes.

Model Tree Structures with an Array of Ancestors: Presents a data model that organizes documents in a tree-like structure by storing references to "parent" nodes and an array that stores all ancestors.

Model Tree Structures with Materialized Paths: Presents a data model that organizes documents in a tree-like structure by storing full relationship paths between documents. In addition to the tree node, each document stores the _id of the node's ancestors or path as a string.

Model Tree Structures with Nested Sets: Presents a data model that organizes documents in a tree-like structure using the Nested Sets pattern. This optimizes discovering subtrees at the expense of tree mutability.

Find full explanation here.

Upvotes: 1

Lucas
Lucas

Reputation: 4097

If you define an index on id and an index on parentId, find the children of a node should be really fast.

Why do you think it'll be horrific?


UPDATE: The worst case scenario would be O(log N) but it's important to note that the bucket size Mongo uses is 8192 (source). That means there's a really small constant multiplying this time which means MongoDB operations can be quite fast.

Upvotes: 3

Related Questions