tumbleweed
tumbleweed

Reputation: 4640

How to return the frequency of each value for every item inside of a pandas dataframe?

I have a very large pandas dataframe like this:

id   fruit
---|------
101  apple
102  apple
101  watermelon
101  orange
102  orange
104  lemon
105  lemon
104  lemon
105  apple
103  grapes

How can I create a count dataframe, where each id is representing the frequency of fruits across the columns, something like this:

     apple   watermelon  orange  lemon  grapes

101    1         1         1       0      0
102    2         0         1       0      0
103    0         0         0       0      1
104    0         0         0       0      2
105    1         0         0       0      0

I tried to:

new_df = df.groupby(['id','fruit']).count()
new_df

And

new_df = df[['id','fruit']].groupby(['id','fruit']).count()
new_df

And

new_df = df[['id','fruit']].groupby(df['fruit'].tolist()).count()
new_df

However, I am not getting the expected output. Any idea of how to create the desired output from the dataframe?

Upvotes: 1

Views: 45

Answers (1)

BENY
BENY

Reputation: 323236

Use: .crosstab()

pd.crosstab(df.id,df.fruit)
Out[251]: 
fruit  apple  grapes  lemon  orange  watermelon
id                                             
101        1       0      0       1           1
102        1       0      0       1           0
103        0       1      0       0           0
104        0       0      2       0           0
105        1       0      1       0           0

Upvotes: 2

Related Questions