Reputation: 89
I am using pymongo to query MongoDB and check duplicates in a particular collection. I have identified the duplicates but I want to add one more filter to the script. Please find my script below
from pymongo import MongoClient
client = MongoClient ('localhost')
db = client.test
data = db.devices.aggregate([
{'$group': {'_id':{'UserId':"$userId",'DeviceType':"$deviceType"},
'count':{"$sum":1}}},
{'$match': {'count' : {"$gt" : 1}}}
])
for _id in data:
print _id
From the above script, I want to check duplicates only for the data where the DeviceType = "email". I have tried adding an "and" condition after the match but it didn't work.
Could you please let me know how to achieve that?
Thanks
Upvotes: 3
Views: 14489
Reputation: 312179
You can do this efficiently by prepending a $match
stage to your pipeline to filter the docs so that you're only grouping on the docs where deviceType = "email":
data = db.devices.aggregate([
{'$match': {'deviceType': 'email'}},
{'$group': {'_id': {'UserId': "$userId", 'DeviceType': "$deviceType"},
'count': {"$sum": 1}}},
{'$match': {'count': {"$gt": 1}}}
])
Upvotes: 5
Reputation: 930
I think this is a near duplicate of using $and with $match in mongodb.
As in that question, I believe you may simply have a syntax error in your query, you will want something like:
$match: {
$and: [
{'count': {"$gt": 1}},
{'DeviceType': {"$eq": "email"}}
]
}
If that doesn't help then please paste what you have tried so far as well as any error message output.
Upvotes: 2