Reputation: 127
i have the following DataFrame:
it contains user_ids, tweets, location and the classification of the tweet as negative and positive.
i want to create a new dataframe that groups by user id, as each user has more than one tweet in the dataframe. the dataframe should contain the following columns:
required sample output
user_id positive_tweets negative_tweets Location
418 1 0 CA
521 1 0 CA
997 0 1 LA
1135 1 0 LA
this code was suggested by Mr. BlackFox for my previous question that i didn't ask correctly.
df.groupby(['user_id','classification'])['user_id'].count()
however, it does not match the required output.
Thanks
Upvotes: 1
Views: 250
Reputation: 51
I hope that's what you are looking for.
df.groupby(['user_id', 'Location']).apply(lambda x: pd.Series(dict(
positive_tweets=(x.classification == 'positive').sum(),
negative_tweets=(x.classification == 'negative').sum(),
)))
Upvotes: 2