Making a collection from collection subset in mongodb

Question

I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like

scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})

which gives less than 5k results. The question is, can I create a new collection with the results of this query? I'd tried insert:

db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())

But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017 I tryied aggregate:

db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])

And I get "exception: A pipeline stage specification object must contain exactly one field." Any hints? Thanks

chridam · Accepted Answer

The first option would be:

var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}}); 
while (cursor.hasNext()) { 
    var doc = cursor.next(); 
    db.pme.save(doc); 
};

As for the aggregation, try

db.balance_sheets.aggregate([
    {
        "$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
    },
    { "$out": "pme" }
]);

For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.

The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2 to insert all the documents matching the query from the balance_sheets collection into the pme collection:

var bulk = db.pme.initializeUnorderedBulkOp(),
    counter = 0;

db.balance_sheets.find({
   "9087n": {"$gte": 40}, 
    "20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
    bulk.insert(doc);

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(); // Execute per 500 operations 
        // and re-initialize every 1000 update statements
        bulk = db.pme.initializeUnorderedBulkOp();
    }
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }

The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():

var bulkOps = db.balance_sheets.find({
    "9087n": { "$gte": 40 }, 
    "20/58n": { "$lte": 40000000 }
}).map(function (doc) { 
    return { "insertOne" : { "document": doc } };    
});

db.pme.bulkWrite(bulkOps);

Making a collection from collection subset in mongodb

Answers (1)

Related Questions