ahmad noori
ahmad noori

Reputation: 127

Create a new Dataframe that counts positive and negative tweets for each user

i have the following DataFrame:

enter image description here

it contains user_ids, tweets, location and the classification of the tweet as negative and positive.

i want to create a new dataframe that groups by user id, as each user has more than one tweet in the dataframe. the dataframe should contain the following columns:

  1. user_id
  2. count of negative tweets by that user_id
  3. count of positive tweets by that user_id
  4. location of the user

required sample output

user_id             positive_tweets   negative_tweets    Location
418                 1                    0                   CA
521                 1                    0                   CA
997                 0                    1                   LA
1135                1                    0                   LA

this code was suggested by Mr. BlackFox for my previous question that i didn't ask correctly.

df.groupby(['user_id','classification'])['user_id'].count()

however, it does not match the required output.

Thanks

Upvotes: 1

Views: 250

Answers (1)

Jazz The Rabbit
Jazz The Rabbit

Reputation: 51

I hope that's what you are looking for.

df.groupby(['user_id', 'Location']).apply(lambda x: pd.Series(dict(
positive_tweets=(x.classification == 'positive').sum(),
negative_tweets=(x.classification == 'negative').sum(),
)))

Upvotes: 2

Related Questions