thegrid
thegrid

Reputation: 514

Adding an array field to a collection based on values of another array field in mongodb

I have a collection which has the following structure as denoted by one document

{
_id : 1,
array1 : [{fld1 : 'doc1e1fld1', fld2: 'doc1e1fld2'},{fld1:'doc1e2fld1',fld2: 'doc1e2fld2'}]
}

I would like to add another field to ALL elements in the collection and set the value such that the modified doc above looks like :

{
    _id : 1,
    array1 : [{fld1 : 'doc1e1fld1', fld2: 'doc1e1fld2'},{fld1:'doc1e2fld1',fld2: 'doc1e2fld2'}],
    array2 : ['doc1e1fld1','doc1e2fld1']
    }

Basically adding a new array element to all documents in the collection and setting its contents to an array which is the fld1 value of all elements in the array1 of the document.

I had a look at Update MongoDB field using value of another field but somehow i dont understand how i can extract ony certain elements .

Upvotes: 2

Views: 2996

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151200

So you should have realized from the question you referenced that you cannot actually refer to another value in a document when updating another field without actually essentially looking up the documents and looping the results to generate updates.

So the bottom line is that you would really do this in code, reading each document and extracting the new values from which to form your new array. Creating that new array can either be don with $set or possibly more safely by using $push and $each operators in the updates.

Ideally you would also use the Bulk Operations API for the most optimal form of updates as well.

Finally you can also delegate some of the array construction workload to the aggregation framework instead of processing all of it in client code, but the updates still need to be performed on the results:

var bulk = db.collection.initializeOrderedBulkOp();
var count = 0;

var cursor = db.collection.aggregate([
    { "$project": {
        "array2": {
            "$map": {
                "input": "array1",
                "as": "el",
                "in": "$$el.fld1"
            }
        }
    }}
]);

cursor.forEach(function(doc) {
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$push": { "array2": { "$each": doc.array2 } }
    });
    count++;

    // send and drain once every 1000 documents
    if ( count % 1000 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
    }
});

// If the counter is uneven then send
if ( count % 1000 != 0 )
    bulk.execute();

So there it is. You can use the aggregation framework with the $map operator like this in order to extract the required elements from the array, or you can just do it in code with a similar method considering that you are not really "aggregating" anything anyway. The main case here is that you are going to need to loop the results in code, whatever the language you do it in.

Of course if you could live with just creating a "new" collection with the altered results, then the $out pipeline stage for aggregate could suit you well:

db.collection.aggregate([
    { "$project": {
        "array1": 1,
        "array2": {
            "$map": {
                "input": "array1",
                "as": "el",
                "in": "$$el.fld1"
            }
        }
    }},
    { "$out": "newcollection" }
]);

In short though you want either some or all of those techniques in order to get the altered collection results you are looking for. So either use this directly or implement some of it.

Upvotes: 3

Related Questions