Eduardo G.R.
Eduardo G.R.

Reputation: 737

MongoDB embedded documents: size limit and aggregation performance concerns

In MongoDB's documentation it is suggested to put as much data as possible in a single document. It is also suggested NOT to use ObjectId ref based sub-documents unless the data of those sub-documents must be referenced from more than one document.

In my case I have a one-to-many relationship like this:

Log schema:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

Machine schema:

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        logs: [ mongoose.model("Log").schema ]
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

Each Machine will have many Production_Log documents (more than one million). Using embedded documents I hitted the 16mb per document limit very quickly during my tests and I couldn't add any more Production_Log documents to the Machine documents.

My doubts

  1. Is this a case where one should be using sub-documents as ObjectId references instead of embedded documents?

  2. Is there any other solution I could evaluate?

  3. I will be accessing Production_Log documents to generate stats for each Machine using the aggregation framework. Should I have any extra consideration on the schema design?

Thank you very much in advance for your advice!

Upvotes: 1

Views: 420

Answers (2)

Valijon
Valijon

Reputation: 13113

Database normalization is not applicable to MongoDB

MongoDB scales better if you store full information in the single document (Data redundancy). Database normalization obligate split data in different collections, but once growth your data, it will cause bottlenecks issues.

Use only LOG Schema:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

Read / Write operation scales fine in this way.

With Aggregation you can process data and compute desired result.

Upvotes: 2

Clement Amarnath
Clement Amarnath

Reputation: 5466

Please see if this approach suits your need

The Log collection would be having more data generated whereas the Machine collection never exceed 16MB. Instead of embedding Log collection into Machine collection try the vice versa.

Your modified schema would be like this

Machine schema:

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true }        
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

Log schema:

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true },
        machine: [ mongoose.model("Machine").schema ]
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

If still we are overshooting the size of Document(16MB) then in the Log Schema we can create a new document for every Day/Hour/Week depending on the logs we are generating.

Upvotes: -1

Related Questions