Reputation: 373
I have similar data in 5 collections in mongodb as follows (documents)
{
"_id" : ObjectId("53490030cf3b942d63cfbc7b"),
"uNr" : "abdc123abcd",
}
I want to iterate through each collection and check if there is uNr match in any collection. If there is then add that uNr and count +1 to new table. For example if there is a match in 3 collections, that it should show {"uNr" : "abcd123", "count": "3"}
Upvotes: 0
Views: 387
Reputation: 24009
If your total number of uNr values is small enough to fit in memory (at most a few million of them), you can total them client-side with a Counter and store them in a MongoDB collection:
from collections import Counter
from pymongo import MongoClient, InsertOne
db = MongoClient().my_database
counts = Counter()
for collection in [db.collection1,
db.collection2,
db.collection3]:
for doc in collection.find():
counts[doc['uNr']] += 1
# Empty the target collection.
db.counts.delete_many({})
inserts = [InsertOne({'_id': uNr, 'n': cnt}) for uNr, cnt in counts.items()]
db.counts.bulk_write(inserts)
Otherwise, query a thousand uNr values at a time and update counts in a separate collection:
from pymongo import MongoClient, UpdateOne, ASCENDING
db = MongoClient().my_database
# Empty the target collection.
db.counts.delete_many({})
db.counts.create_index([('uNr', ASCENDING)])
for collection in [db.collection1,
db.collection2,
db.collection3]:
cursor = collection.find(no_cursor_timeout=True)
# "with" statement helps ensure cursor is closed, since the server will
# never auto-close it.
with cursor:
updates = []
for doc in cursor:
updates.append(UpdateOne({'_id': doc['uNr']},
{'$inc': {'n': 1}},
upsert=True))
if len(updates) == 1000:
db.counts.bulk_write(updates)
updates = []
if updates:
# Last batch.
db.counts.bulk_write(updates)
Upvotes: 1