user3355820
user3355820

Reputation: 272

How to add a field in all records of mongo db collection?

I am trying to mix two fields in single field of collection by prepending it. I don’t want to specify the field Id in where condition as I am dealing with 3.6 million data. I simply want to combine two fields of a same collection for all records for example: I have a collection as

{
  "_id": "56c58adf4f40",
  "data1": "test1",
   "data2": "test2"

}

I need output to be like data2 = data1 + data2 for all records.

{
  "_id": "56c58adf4f40",
  "data1": "test1",
   "data2": "test1 test2"

}

I have tried the below to insert a field but that just inserts the specified data when I provide the id :

db.collection.update(
  { "_id" :56c58adf4f40},
  { $set: { "data": "test" } }

Upvotes: 1

Views: 115

Answers (1)

chridam
chridam

Reputation: 103375

Use the $concat operator in the $project step of an aggregation framework pipeline. Running the following aggregation will give you the desired output without need to update your collection:

db.collection.aggregate([
    {
        "$project": {
            "data1": 1,
            "data2": { $concat: [ "$data1", " ", "$data2" ] }
        }
    }
])

Should you wish to update the collection with this result set, then you could use the forEach() method on the aggregate() cursor to iterate the documents within the result list, update your collection with each document within the loop. For example:

var cursor = db.collection.aggregate([
        {
            "$project": {
                "data1": 1,
                "data2": { $concat: [ "$data1", " ", "$data2" ] }
            }
        }
    ]),
    updateCollUsingAgg = function(doc){
        db.collection.update(
            { "_id": doc._id },
            { "$set": { "data2": doc.data2 } }
        )
    }

cursor.forEach(updateCollUsingAgg);

You can also update the collection without the aggregate() method but using the find() cursor to iterate your collection:

var cursor = db.collection.find(),
    updateCollUsingFind = function(doc){
        db.collection.update(
            { "_id": doc._id },
            { "$set": { "data2": doc.data1+" "+doc.data2 } }
        )
    };
cursor.forEach(updateCollUsingFind);

For improved performance especially when dealing with large collections, take advantage of using a Bulk() API for updating the collection efficiently in bulk as you will be sending the operations to the server in batches (for example, say a batch size of 500). This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.

The following examples demonstrate using the Bulk() API available in MongoDB versions >= 2.6 and < 3.2.

// Bulk update collection
var bulkUpdateOp = db.collection.initializeUnorderedBulkOp(), 
    pipeline = [
        {
            "$project": {
                "data1": 1,
                "data2": { $concat: [ "$data1", " ", "$data2" ] }
            }
        }
    ],
    counter =  0, // counter to keep track of the batch update size
    // Get modified data2 fields using aggregation framework
    cursor = db.collection.aggregate(pipeline); 

cursor.forEach(function(doc){
    // update collection
    bulkUpdateOp.find({"_id": doc._id}).updateOne({ "$set": { "data2": doc.data2 } }); 
    counter++; // increment counter
    // execute the bulk update operation in batches of 500
    if (counter % 500 == 0) { 
        bulkUpdateOp.execute();
        bulkUpdateOp = db.collection.initializeUnorderedBulkOp();
    }
});

if (counter % 500 != 0) { bulkUpdateOp.execute(); }

The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite().

It uses the same cursors as above but instead of iterating the result, create the arrays with the bulk operations by using its map() method:


 var pipeline = [
        {
            "$project": {
                "data1": 1,
                "data2": { $concat: [ "$data1", " ", "$data2" ] }
            }
        }
    ],
    cursor = db.collection.aggregate(pipeline),
    bulkUpdateOps = cursor.map(function (doc) { 
        return { 
            "updateOne": {
                "filter": { "_id": doc._id },
                "update": { "$set": { "data2": doc.data2 } } 
             }
        };
    });         

db.collection.bulkWrite(bulkUpdateOps);

Upvotes: 2

Related Questions