Nedim
Nedim

Reputation: 573

Count ignoring duplicate documents

I'd like to count all emails within the specific project (ID: 7), but ignoring duplicate rows in ONE campaign.

Here's the example of my collection structure:

{
    "_id" : ObjectId("581a9054c274f7b512e8ed94"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 10
}

{
    "_id" : ObjectId("581a9064c274f7b512e8ed95"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 10
}

{
    "_id" : ObjectId("581a9068c274f7b512e8ed96"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 10
}

{
    "_id" : ObjectId("581a906cc274f7b512e8ed97"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 11
}

{
    "_id" : ObjectId("581a9072c274f7b512e8ed98"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 11
}

{
    "_id" : ObjectId("581a9079c274f7b512e8ed99"),
    "email" : "[email protected]",
    "IDproject" : 7,
    "IDcampaign" : 12
}

This is what the result should be:

[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected]

Total: 5 (of 6). Note that [email protected] is mentioned twice. That's because [email protected] has campaigns 10, 10 and 11. We're ignoring one 10.

This is what I've tried:

db.mycollection.aggregate([
    {$match : {IDproject : 7}},
    {$group : {_id : "$email", total : {$sum : 1}}}
])

But it returns only unique emails ignoring IDcampaign. Also, I can get unique number of emails with the following query:

db.mycollection.distinct('email', {IDproject : 7})

But again, it shows only unique emails ignoring IDcampaign.

Could someone give me a hint how to count emails including IDcampaign?

Thanks.

p.s. I use MongoDB with PHP, and I can solve the problem with PHP calculations, but that's not the solution.

Upvotes: 1

Views: 1067

Answers (1)

chridam
chridam

Reputation: 103365

Include it as part of your group key, as in the following example:

db.mycollection.aggregate([
    { "$match": { "IDproject": 7 } },
    {
        "$group": {
            "_id": {
                "email" : "$email",                
                "IDcampaign" : "$IDcampaign"
            },
            "count": { "$sum": 1 }
        }
    }
])

Sample Output

/* 1 */
{
    "_id" : {
        "email" : "[email protected]",
        "IDcampaign" : 10
    },
    "count" : 1
}

/* 2 */
{
    "_id" : {
        "email" : "[email protected]",
        "IDcampaign" : 12
    },
    "count" : 1
}

/* 3 */
{
    "_id" : {
        "email" : "[email protected]",
        "IDcampaign" : 11
    },
    "count" : 1
}

/* 4 */
{
    "_id" : {
        "email" : "[email protected]",
        "IDcampaign" : 10
    },
    "count" : 2
}

/* 5 */
{
    "_id" : {
        "email" : "[email protected]",
        "IDcampaign" : 11
    },
    "count" : 1
}

To answer your follow-up question on getting the count only since you don't need the list of emails, you could run the following pipeline

db.mycollection.aggregate([
    { "$match": { "IDproject": 7 } },
    {
        "$group": {
            "_id": null,
            "count": { "$sum": 1 },
            "emails": {
                "$addToSet": {
                    "email" : "$email",                
                    "IDcampaign" : "$IDcampaign"
                }
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "count": 1,
            "total": { "$size": "$emails" }
        }
    }
])

which gives you the result

{        
    "total" : 5,
    "count" : 6
}

that you can interpret as Total 5 (of 6)

Upvotes: 4

Related Questions