Reputation: 35684
Is it possible to update collection A using data from collection B?
background: The goal of this is to get around lack of proper atomicity of mongodb. I'm looping through some log data to generate data aggregates and want to know that the data matches what was asked to be put in. Rather than doing a two phase commit, I would like to generate a batch write into a holding collection. When the collection is finished (for example after 10000 records or an entire file being read), a count of the documents from the the database is compared to the count that the application generated, if it matches, update the big collection with the temporary collection. At the beginning of next import clear the temporary collection. This way if the process gets interrupted at any time, the likelihood of it happening during the update phase would be lesser, and any error during the temporary collection population would be fixed automatically by wiping of temporary data and the process restarting on next startup.
Is it possible to update the main collection using the data from the temporary collection? Is this kind of update significantly faster than individual updates from the application?
update:
the data looks something like this(below) in both collections. I'm looking to merge the records that would find a matching document based on event
and month
or create a new one if one does not exist. it would then take the daily numbers and increment them. The idea is that the process that is updating the counts in the temporary collection gets updated once per record read, so each daily count is one write. When I have finished a batch, I would like to update the main collection with the content of the temporary collection, using a single mongodb command.
{
"event": "abc",
"month": "2012-04",
"daily": {
"1": 82,
"2": 6,
"3": 12,
"4": 23,
"5": 62,
...
}
}
Upvotes: 2
Views: 176
Reputation: 70
Here is throw at this from a noob.
From the info I got from your question. I would use a JS loop to work this out.
Here is something that may use as a source.
db.collection_a.find().batchSize(10000).forEach(function(doc){
result = db.collection_b.findOne({'column_b': doc['column_a']});
if (result != null){
db.collection_a.update({'_id': doc['_id']},{
$set:{
'new_column_a': result['column_b']
}
})}
else {
print('Not found ' + doc['column_a'])
}
});
batchsize can be changed in the first function.
Then the _id will be updated. Lastly new column is added to the collection_a. Data is grabbed from collection b.
Upvotes: 2