Reputation: 109
I am new to Mongodb and I heard that Mongodb is good for massive amount of read and write operations. Embedded document is one of the features that make it happen. But I am not sure if it is also a cause of performance issue. Book document example:
{
"_id": 1,
"Authors": [
{
"Email": "email",
"Name": "name"
}
],
"Title": "title",
...
}
If there are thousands of books by one author, and his email needs to be updated, I need to write some query which can
These operations do not seem efficient. But this type of update is ubiquitous, I believe the developers have considered this. So, where did I get it wrong?
Upvotes: 3
Views: 2112
Reputation: 103455
Your current embedded schema design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk.
If your application frequently accesses books
information along with the Authors
data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data.
To illustrate this, say you want all books by one author have his email field updated, this can be done with one single (atomic) operation, which is not a performance issue with MongoDB:
db.books.updateMany(
{ "Authors.name": "foo" },
{
"$set": { "Authors.$.email": "[email protected]" }
}
);
or with earlier MongoDB versions:
db.books.update(
{ "Authors.name": "foo" },
{
"$set": { "Authors.$.email": "[email protected]" }
},
{ "multi": true }
)
In the above, you use the positional $
operator which facilitates updates to arrays that contain embedded documents by identifying an element in an array to update without explicitly specifying the position of the element in the array. Use it with the dot notation on the $
operator.
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, especically Model One-to-Many Relationships with Embedded Documents.
The other design option which you can consider is referencing documents where you follow a normalized schema. For example:
// db.books schema
{
"_id": 3
"authors": [1, 2, 3] // <-- array of references to the author collection
"title": "foo"
}
// db.authors schema
/*
1
*/
{
"_id": 1,
"name": "foo",
"surname": "bar",
"address": "xxx",
"email": "[email protected]"
}
/*
2
*/
{
"_id": 2,
"name": "abc",
"surname": "def",
"address": "xyz",
"email": "[email protected]"
}
/*
3
*/
{
"_id": 3,
"name": "alice",
"surname": "bob",
"address": "xyz",
"email": "[email protected]"
}
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of author documents per give book entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
For querying a normalized schema, you can consider using the aggregation framework's $lookup
operator which performs a left outer join to the authors
collection in the same database to filter in documents from the books
collection for processing.
Thus said, I believe your current schema is a better approach than creating a separate collection of authors
since separate collections require more work i.e. finding an book + its authors is two queries and requires extra work whereas the above schema embedded documents are easy and fast (single seek). There are no big differences for inserts and updates. So, separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. Embedded documents are also good when you want the entire document, the document with a $slice
of the embedded authors
, or with no authors
at all.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
Upvotes: 4
Reputation: 5662
I think you basically have the wrong schema design. MongoDB allows you to structure your data heirarchically, but that isnt an excuse for structuring it inefficiently. If its likely you'll need to update thousands of documents across entire collections on a regular basis then its worth asking if you have the right schema design.
There are lots of articles about covering schema design, and the comparison with relational DB structures. For example: http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
Upvotes: 0