Reputation: 4640
I have a very large pandas dataframe like this:
id fruit
---|------
101 apple
102 apple
101 watermelon
101 orange
102 orange
104 lemon
105 lemon
104 lemon
105 apple
103 grapes
How can I create a count dataframe, where each id is representing the frequency of fruits across the columns, something like this:
apple watermelon orange lemon grapes
101 1 1 1 0 0
102 2 0 1 0 0
103 0 0 0 0 1
104 0 0 0 0 2
105 1 0 0 0 0
I tried to:
new_df = df.groupby(['id','fruit']).count()
new_df
And
new_df = df[['id','fruit']].groupby(['id','fruit']).count()
new_df
And
new_df = df[['id','fruit']].groupby(df['fruit'].tolist()).count()
new_df
However, I am not getting the expected output. Any idea of how to create the desired output from the dataframe?
Upvotes: 1
Views: 45
Reputation: 323236
Use: .crosstab()
pd.crosstab(df.id,df.fruit)
Out[251]:
fruit apple grapes lemon orange watermelon
id
101 1 0 0 1 1
102 1 0 0 1 0
103 0 1 0 0 0
104 0 0 2 0 0
105 1 0 1 0 0
Upvotes: 2