Create a new Dataframe that counts positive and negative tweets for each user

Question

i have the following DataFrame:

it contains user_ids, tweets, location and the classification of the tweet as negative and positive.

i want to create a new dataframe that groups by user id, as each user has more than one tweet in the dataframe. the dataframe should contain the following columns:

user_id
count of negative tweets by that user_id
count of positive tweets by that user_id
location of the user

required sample output

user_id             positive_tweets   negative_tweets    Location
418                 1                    0                   CA
521                 1                    0                   CA
997                 0                    1                   LA
1135                1                    0                   LA

this code was suggested by Mr. BlackFox for my previous question that i didn't ask correctly.

df.groupby(['user_id','classification'])['user_id'].count()

however, it does not match the required output.

Thanks

Jazz The Rabbit · Accepted Answer

I hope that's what you are looking for.

df.groupby(['user_id', 'Location']).apply(lambda x: pd.Series(dict(
positive_tweets=(x.classification == 'positive').sum(),
negative_tweets=(x.classification == 'negative').sum(),
)))

Create a new Dataframe that counts positive and negative tweets for each user

Answers (1)

Related Questions