Reputation: 173
I am a beginner in MongoDB. I have 20 million documents in a MongoDB collection. Example of a single document:
I want to get the total tweet count, friends count and followers count of each user_screen_name along with each user_screen_name's account creation year.
I tried this:
user_details_pipeline = [{
"$project": {
"_id": 0,
"user_mentions": 1,
"user_followers_count": 1,
"user_friends_count": 1,
"user_account_creation": 1
}
}, {
"$unwind": "$user_mentions"
}, {
"$group": {
"_id": "$user_mentions.screen_name",
"count": {
"$sum": 1
}
}
}, {
"$sort": {
"count": -1
}
}
]
But this does not querying the count of tweets, followers, friends and account creation date, only how many times a user is being mentioned. Can anyone please help?
Example of one output:
{"user_screen_name": "BorisJohnson", "user_followers_count": 1000000, "user_friends_count": 50, "total_tweets": 50000, "user_account_creation": 2012}
Upvotes: 4
Views: 164
Reputation: 4411
You can do this using aggregation. You are simply projecting the desired fields, but aggregation provides the $year
operator (and others) to return the desired component from a date. You should ensure that the dates have been stored as ISODate()
.
db.collection.aggregate(
[
"$project" : {
"_id": 0,
"user_screen_name": 1,
"user_followers_count": 1,
"user_friends_count": 1,
"total_tweets": 1,
"user_account_creation": {
"$year" : "$user_account_creation"
}
}
]
)
If you are not using ISODate()
, you can use the $dateFromString
operator with format specifies to get an ISODate()
then use that as needed.
{
"$dateFromString": {
"dateString": "06-15-2018",
"format": "%m-%d-%Y"
}
}
However, the format specifiers listed in the documentation only work with numerical components and not words (like “Thurs” or “Jan”).
Upvotes: 0
Reputation: 8834
If your date is not a date type (it really should be though!), you can use the following aggregate query using $substr (to slice out the year) and $min (to get the earliest year)
result = db.mycollection.aggregate([
{'$group': {
'_id': '$user_screen_name',
'user_followers_count': {'$sum': '$user_followers_count'},
'user_friends_count': {'$sum': '$user_friends_count'},
'total_tweets': {'$sum': '$user_tweets_count'},
'user_account_creation': {'$min': {'$substr': ['$user_account_creation', 26, 4]}}
}},
{'$project': {
'_id': 0,
'user_screen_name': '$_id',
'user_followers_count': 1,
'user_friends_count': 1,
'total_tweets': 1,
'user_account_creation': 1
}}
])
Upvotes: 1
Reputation: 46481
You basically need to use $group
aggregation here.
aggregate([
{ $group: {
_id: "$user_screen_name",
user_followers_count: { $sum: "$user_followers_count" },
user_friends_count: { $sum: "$user_friends_count" },
total_tweets: { $sum: "$total_tweets" },
user_account_creation: { $first: "$user_account_creation" }
}}
])
Upvotes: 1