Reputation: 229
Say we have a DataFrame that looks like this:
day_of_week ice_cream count proportion
0 Friday vanilla 638 0.094473
1 Friday chocolate 2048 0.663506
2 Friday strawberry 4088 0.251021
3 Monday vanilla 448 0.079736
4 Monday chocolate 2332 0.691437
5 Monday strawberry 441 0.228828
6 Saturday vanilla 24 0.073350
7 Saturday chocolate 244 0.712930 ... ...
I want a new DataFrame that collapses onto day_of_week
as an index so it looks like this:
day_of_week vanilla chocolate strawberry
0 Friday 0.094473 0.663506 0.251021
1 Monday 0.079736 0.691437 0.228828
2 Saturday ... ... ...
What's the cleanest way I can implement this?
Upvotes: 6
Views: 4117
Reputation: 294308
Using set_index
and unstack
df.set_index(['day_of_week', 'ice_cream']).proportion.unstack() \
.reset_index().rename_axis([None], 1)
day_of_week chocolate strawberry vanilla
0 Friday 0.663506 0.251021 0.094473
1 Monday 0.691437 0.228828 0.079736
2 Saturday 0.712930 NaN 0.073350
timing vs pivot_table
Upvotes: 1
Reputation: 36645
Use pivot table:
import pandas as pd
import numpy as np
df = pd.DataFrame({'day_of_week':['Friday','Sunday','Monday','Sunday','Friday','Friday'], \
'count':[200,300,100,50,110,90], 'ice_cream':['choco','vanilla','vanilla','choco','choco','straw'],\
'proportion':[.9,.1,.2,.3,.8,.4]})
print df
# If you like replace np.nan with zero
tab = pd.pivot_table(df,index='day_of_week',columns='ice_cream', values=['proportion'],fill_value=np.nan)
print tab
Output:
count day_of_week ice_cream proportion
0 200 Friday choco 0.9
1 300 Sunday vanilla 0.1
2 100 Monday vanilla 0.2
3 50 Sunday choco 0.3
4 110 Friday choco 0.8
5 90 Friday straw 0.4
proportion
ice_cream choco straw vanilla
day_of_week
Friday 0.85 0.4 NaN
Monday NaN NaN 0.2
Sunday 0.30 NaN 0.1
Upvotes: 1
Reputation: 17506
df.pivot_table
is the correct solution:
In[31]: df.pivot_table(values='proportion', index='day_of_week', columns='ice_cream').reset_index()
Out[31]:
ice_cream day_of_week chocolate strawberry vanilla
0 Friday 0.663506 0.251021 0.094473
1 Monday 0.691437 0.228828 0.079736
2 Saturday 0.712930 NaN 0.073350
If you leave out reset_index()
it will actually return an indexed dataframe, which might be more useful for you.
Note that a pivot table necessarily performs a dimensionality reduction when the values
column is not a function of the tuple (index, columns)
. If there are multiple (index, columns)
pairs with different value
pivot_table
brings the dimensionality down to one by using an aggregation function, by default mean
.
Upvotes: 4
Reputation: 38415
You are looking for pivot_table
df = pd.pivot_table(df, index='day_of_week', columns='ice_cream', values = 'proportion')
You get:
ice_cream chocolate strawberry vanilla
day_of_week
Friday 0.663506 0.251021 0.094473
Monday 0.691437 0.228828 0.079736
Saturday 0.712930 NaN 0.073350
Upvotes: 2