Reputation: 572
I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like
scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})
which gives less than 5k results. The question is, can I create a new collection with the results of this query? I'd tried insert:
db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())
But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017 I tryied aggregate:
db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])
And I get "exception: A pipeline stage specification object must contain exactly one field." Any hints? Thanks
Upvotes: 1
Views: 749
Reputation: 103365
The first option would be:
var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}});
while (cursor.hasNext()) {
var doc = cursor.next();
db.pme.save(doc);
};
As for the aggregation, try
db.balance_sheets.aggregate([
{
"$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
},
{ "$out": "pme" }
]);
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2
to insert all the documents matching the query from the balance_sheets
collection into the pme
collection:
var bulk = db.pme.initializeUnorderedBulkOp(),
counter = 0;
db.balance_sheets.find({
"9087n": {"$gte": 40},
"20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
bulk.insert(doc);
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations
// and re-initialize every 1000 update statements
bulk = db.pme.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2
which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite()
:
var bulkOps = db.balance_sheets.find({
"9087n": { "$gte": 40 },
"20/58n": { "$lte": 40000000 }
}).map(function (doc) {
return { "insertOne" : { "document": doc } };
});
db.pme.bulkWrite(bulkOps);
Upvotes: 1