user3063530
user3063530

Reputation: 89

How to filter data in mongo collection using pymongo

I am using pymongo to query MongoDB and check duplicates in a particular collection. I have identified the duplicates but I want to add one more filter to the script. Please find my script below

from pymongo import MongoClient


client = MongoClient ('localhost')
db = client.test

data = db.devices.aggregate([
    {'$group': {'_id':{'UserId':"$userId",'DeviceType':"$deviceType"},
                'count':{"$sum":1}}}, 
    {'$match': {'count' : {"$gt" : 1}}}
])

for _id in data:
    print _id

From the above script, I want to check duplicates only for the data where the DeviceType = "email". I have tried adding an "and" condition after the match but it didn't work.

Could you please let me know how to achieve that?

Thanks

Upvotes: 3

Views: 14489

Answers (2)

JohnnyHK
JohnnyHK

Reputation: 312179

You can do this efficiently by prepending a $match stage to your pipeline to filter the docs so that you're only grouping on the docs where deviceType = "email":

data = db.devices.aggregate([
    {'$match': {'deviceType': 'email'}},
    {'$group': {'_id': {'UserId': "$userId", 'DeviceType': "$deviceType"},
                'count': {"$sum": 1}}}, 
    {'$match': {'count': {"$gt": 1}}}
])

Upvotes: 5

Keyan P
Keyan P

Reputation: 930

I think this is a near duplicate of using $and with $match in mongodb.

As in that question, I believe you may simply have a syntax error in your query, you will want something like:

$match: {
    $and: [
        {'count': {"$gt": 1}},
        {'DeviceType': {"$eq": "email"}}
    ]
}

If that doesn't help then please paste what you have tried so far as well as any error message output.

Upvotes: 2

Related Questions