Reputation: 654
I have two mongo databases on different machines, host1:27017/db1
and host2:27017/db2
with identical collection item
on both the databases. How do I copy selected data, say
db1.item.find({"date": { $gte : "2016-03-15" }})
from db1.item
to db2.item
using mongo shell. I do not want to clone collection (because they are huge), but copy just the selected data.
Upvotes: 2
Views: 1636
Reputation: 50416
So while it it "possible" to use the shell ( and no-one said it wasn't ) it's just not the "best" way.
The "best" approach is using mongodump
and mongorestore
. You don't need "temporary dump files" either. It's just a matter of "piping" output from one into the other:
Depending on which host you actually run this from as to where you put the -h
option:
mongodump -h host2 -d db2 -c item \
--query '{ "date": { "$gte": "2016-03-15" } }' \
--out - \
| mongorestore -d db1 -c item -
From MongoDB 3.2 releases these commands can use compressed data as well. This needs the --gzip
and --archive
options:
mongodump -h host2 -d db2 -c item \
--query '{ "date": { "$gte": "2016-03-15" } }' \
--gzip --archive \
| mongorestore -d db1 -c item --gzip --archive
That's always the fastest way to move things between databases and especially between hosts.
If you are insistent on writing this in the shell, then you should at least get it right.
Of course you can use the connect()
or Mongo()
methods to refernce the remote connection, but that is really only part of the story, since once connected you still need to handle this efficiently.
The best way to do this is use "Bulk Operations", as this removes the overhead of request and acknowledgement for every new .insert()
operation with the target server and collection. It's going to reduce a lot of time, though still not as efficient as the use of utilities above:
Modern MongoDB 3.2 has bulkWrite()
:
var db2 = connect('host2/db2');
var operations = [];
db2.item.find({ "date": { "$gte": "2016-03-15" } }).forEach(function(doc) {
operations.push({ "insertOne": { "document": doc } });
// Actually only write every 1000 entries at once
if ( operations.length == 1000 ) {
db.item.bulkWrite(operations,{ "ordered": false })
operations = [];
}
});
// Write any remaining
if ( operations.length > 0 ) {
db.item.bulkWrite(operations,{ "ordered": false });
}
For MongoDB 2.6 releases there is another "bulk" constructor:
var db2 = connect('host2/db2');
var bulk = db.item.initializeUnorderedBulkOp();
var count = 0;
db2.item.find({ "date": { "$gte": "2016-03-15" } }).forEach(function(doc) {
bulk.insert(doc);
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.item.initializeUnorderedBulkOp();
}
});
if ( count % 1000 != 0 ) {
bulk.execute();
}
Of course the newer method is really just calling the same underlying "older" methods underneath. But the main point is for consistency in other API's, where quite often the point is to "downgrade" the operations when working with a server version less than MongoDB 2.6 that has no "Bulk Operations" wire protocol, and then just handles the loop and commit of each operation in the batch for you.
In either case the "unordered" approach is best, since the operations are in fact committed on the server in "parallel" instead of "serially", which means multiple things are actually writing at the same time.
So really, all of this is how the code is implemented in external utilities anyway, and actually in a more organized and "low level" form. Naturally the "shell" does not compress data "over the wire" with comunication between hosts, nor does it have access to the "low level" write functions you could do with a BSON library and low level code, that both work much faster.
The "dump and restore" actually can work directly with a compressed BSON form of the data and commits the writes in a very efficient way. By that token, it is your best option for doing this rather than coding the implementation yourself.
Upvotes: 6
Reputation: 654
I do feel that mongodump
& mongorestore
is the more pervasive way to do it. Although, I was able to find a way to do it all through the mongo shell (avoiding any temporary dump files) which is what I was looking for.
[user@host1 ~]$ mongo
use db1;
var host2db2 = connect("host2:27017/db2")
host2db2.item.find({
"date" : { $gte : "2016-03-15"}
}).forEach(function(doc){
db1.item.insert(doc);
});
Credits to : Save Subset of MongoDB Collection to Another Collection
Upvotes: 0