elnino
elnino

Reputation: 173

MongoDB count total of three fields

I am a beginner in MongoDB. I have 20 million documents in a MongoDB collection. Example of a single document:

enter image description here

I want to get the total tweet count, friends count and followers count of each user_screen_name along with each user_screen_name's account creation year.

I tried this:

user_details_pipeline = [{
    "$project": {
        "_id": 0,
        "user_mentions": 1,
        "user_followers_count": 1,
        "user_friends_count": 1,
        "user_account_creation": 1
    }
}, {
    "$unwind": "$user_mentions"
}, {
    "$group": {
        "_id": "$user_mentions.screen_name",
        "count": {
            "$sum": 1
        }
    }
}, {
    "$sort": {
        "count": -1
    }
}
]

But this does not querying the count of tweets, followers, friends and account creation date, only how many times a user is being mentioned. Can anyone please help?

Example of one output:

{"user_screen_name": "BorisJohnson", "user_followers_count": 1000000, "user_friends_count": 50, "total_tweets": 50000, "user_account_creation": 2012}

Upvotes: 4

Views: 164

Answers (3)

Chris
Chris

Reputation: 4411

You can do this using aggregation. You are simply projecting the desired fields, but aggregation provides the $year operator (and others) to return the desired component from a date. You should ensure that the dates have been stored as ISODate().

db.collection.aggregate(
    [
        "$project" : {
            "_id": 0,
            "user_screen_name": 1,
            "user_followers_count": 1,
            "user_friends_count": 1, 
            "total_tweets": 1,
            "user_account_creation": { 
                "$year" : "$user_account_creation"
            }
        }
    ]
)

If you are not using ISODate(), you can use the $dateFromString operator with format specifies to get an ISODate() then use that as needed.

{
    "$dateFromString": {
        "dateString": "06-15-2018",
         "format": "%m-%d-%Y"
    }
}

However, the format specifiers listed in the documentation only work with numerical components and not words (like “Thurs” or “Jan”).

Upvotes: 0

Belly Buster
Belly Buster

Reputation: 8834

If your date is not a date type (it really should be though!), you can use the following aggregate query using $substr (to slice out the year) and $min (to get the earliest year)

result = db.mycollection.aggregate([
    {'$group': {
        '_id': '$user_screen_name',
        'user_followers_count': {'$sum': '$user_followers_count'},
        'user_friends_count': {'$sum': '$user_friends_count'},
        'total_tweets': {'$sum': '$user_tweets_count'},
        'user_account_creation': {'$min': {'$substr': ['$user_account_creation', 26, 4]}}
    }},
    {'$project': {
        '_id': 0,
        'user_screen_name': '$_id',
        'user_followers_count': 1,
        'user_friends_count': 1,
        'total_tweets': 1,
        'user_account_creation': 1
    }}
])

Upvotes: 1

Ashh
Ashh

Reputation: 46481

You basically need to use $group aggregation here.

aggregate([
    { $group: {
        _id: "$user_screen_name",
        user_followers_count: { $sum: "$user_followers_count" },
        user_friends_count: { $sum: "$user_friends_count" },
        total_tweets: { $sum: "$total_tweets" },
        user_account_creation: { $first: "$user_account_creation" }
    }}
])

Upvotes: 1

Related Questions