ZachB
ZachB

Reputation: 15366

Insert only new properties into embedded document

I want to insert new values into an embedded document only if the embedded document doesn't already have the key. For example, my document is initially

{
    name: "doc1",
    embeddedDoc: {
        a: 123,
        b: 456
    }
}

Then if I were to "insert" {b: 789, c: 101}, I want the result to have embeddedDoc: {a: 123, b: 456, c: 101} (insert the new c, don't touch the existing b). Is there a way to do this?

The best solution I have so far is to run one query per element in the inserting object:

collection.update(
    {"embeddedDoc.b": {$exists: false}},
    {$set: {"embeddedDoc.b": 456}}
)

Upvotes: 1

Views: 150

Answers (2)

Blakes Seven
Blakes Seven

Reputation: 50406

Personally, I would not have an object in my document but an array instead. It won't solve the issue of not overwrting data by itself, but it does turn the "lookup" on a "key" into using "data" that can be indexed. This is going to make a "big" difference if you intend these kinds of updates regularly.

{
    "name": "Doc1",
    "properties": [
        { "key": "a", "value": 123 },
        { "key": "b", "value": 456 }
    ]
}

So with new data for "b" and "c", then you could make a query attempt like this:

db.collection.update(
    { "properties.key" { "$nin": [ "b", "c" ]  } },
    { 
        "$push": { 
            "properties": { 
                "$each": [
                    { "key": "b", "value": 789 },
                    { "key": "c", "value": 101 }
                ]
            }
        }
    },
    { "multi": true }
)

As said, at least than "can" use an index which a "key" lookup with $exists cannot, and you can at least "try" to do this operation as a "first shot", where if documents did meet the given condition where both "b" and "c", "key" values are not present then those documents would be affected.

The fallback case would of course be to try individually for each property here, but the whole process is really not hard to work out with Bulk Operations:

var obj = { "b": 456, "c": 101 };

var keys = Object.keys(obj),
    mapped = keys.map(function(key) {
        return { "key": key, "value": obj[key] };
    });

var bulk = intitializeOrderedBulkOp();

bulk.find({ "properties.key": { "$nin": keys } }).update({
    "$push": { "properties": { "$each": mapped } }
});

keys.forEach(function(key) {
    bulk.find({ "properties.key": { "$ne": key } }).update({
        "$push": { "properties": { "key": key, "value": obj[key] } }
    });
});

bulk.execute();

Noting that if the first query did actually succeed, then everything else might well be a "no-op". But if you are doing this with "multi" as this does, then you never really can be sure unless you already know how many documents to test for. So probably sending all at once is "better safe than sorry".

So it's nice an simple to code, makes one trip to and from the server due to bulk operations, and is "efficient" in the lookup of data to match conditions due to the ability to use an index.

The "discarding merge" logic you want to implement will always mean multiple updates. But if you at least make the process as "efficient" as possible, then that is at least one less thing to worry about.

Upvotes: 1

Salvador Dali
Salvador Dali

Reputation: 222461

I highly doubt that you can do better than 2 queries for many properties. If you have many properties, you can make the first query to get all the properties, then knowing the properties you have, find the missing one as set difference. Then update only missing properties.

So for the arbitrary number of properties, you end up with 2 queries and a simple application logic. Note that in highly busy application something can happen between query1(find) and query2(update).

Upvotes: 1

Related Questions