Insert only new properties into embedded document

Question

I want to insert new values into an embedded document only if the embedded document doesn't already have the key. For example, my document is initially

{
    name: "doc1",
    embeddedDoc: {
        a: 123,
        b: 456
    }
}

Then if I were to "insert" {b: 789, c: 101}, I want the result to have embeddedDoc: {a: 123, b: 456, c: 101} (insert the new c, don't touch the existing b). Is there a way to do this?

The best solution I have so far is to run one query per element in the inserting object:

collection.update(
    {"embeddedDoc.b": {$exists: false}},
    {$set: {"embeddedDoc.b": 456}}
)

Blakes Seven · Accepted Answer

Personally, I would not have an object in my document but an array instead. It won't solve the issue of not overwrting data by itself, but it does turn the "lookup" on a "key" into using "data" that can be indexed. This is going to make a "big" difference if you intend these kinds of updates regularly.

{
    "name": "Doc1",
    "properties": [
        { "key": "a", "value": 123 },
        { "key": "b", "value": 456 }
    ]
}

So with new data for "b" and "c", then you could make a query attempt like this:

db.collection.update(
    { "properties.key" { "$nin": [ "b", "c" ]  } },
    { 
        "$push": { 
            "properties": { 
                "$each": [
                    { "key": "b", "value": 789 },
                    { "key": "c", "value": 101 }
                ]
            }
        }
    },
    { "multi": true }
)

As said, at least than "can" use an index which a "key" lookup with $exists cannot, and you can at least "try" to do this operation as a "first shot", where if documents did meet the given condition where both "b" and "c", "key" values are not present then those documents would be affected.

The fallback case would of course be to try individually for each property here, but the whole process is really not hard to work out with Bulk Operations:

var obj = { "b": 456, "c": 101 };

var keys = Object.keys(obj),
    mapped = keys.map(function(key) {
        return { "key": key, "value": obj[key] };
    });

var bulk = intitializeOrderedBulkOp();

bulk.find({ "properties.key": { "$nin": keys } }).update({
    "$push": { "properties": { "$each": mapped } }
});

keys.forEach(function(key) {
    bulk.find({ "properties.key": { "$ne": key } }).update({
        "$push": { "properties": { "key": key, "value": obj[key] } }
    });
});

bulk.execute();

Noting that if the first query did actually succeed, then everything else might well be a "no-op". But if you are doing this with "multi" as this does, then you never really can be sure unless you already know how many documents to test for. So probably sending all at once is "better safe than sorry".

So it's nice an simple to code, makes one trip to and from the server due to bulk operations, and is "efficient" in the lookup of data to match conditions due to the ability to use an index.

The "discarding merge" logic you want to implement will always mean multiple updates. But if you at least make the process as "efficient" as possible, then that is at least one less thing to worry about.

Insert only new properties into embedded document

Answers (2)

Related Questions