Reputation: 513
I have a simple question about how to pivot a Pandas Dataframe with the extra problem of having an additional column.
The dataset looks like this one:
X = pd.DataFrame({'country':['Peru','Peru','Japan','Japan'],'method':['m1','m2','m1','m2'], 'value':[1,2,3,4]})
Country | Method | Value
Peru | m1 | 1
Peru | m2 | 2
Japan | m1 | 3
Japan | m2 | 4
All the "Countries" have values for all the "Methods" I would like to pivot this dataframe with each Country as a column but I need to carry on the method:
Peru | Japan | Method
1 | 3 | m1
2 | 4 | m4
Thanks for the help!
Upvotes: 0
Views: 1158
Reputation: 863281
Solution with set_index
and unstack
:
print (X.set_index(['method','country'])['value']
.unstack(fill_value=0)
.rename_axis(None, axis=1)
.reset_index())
method Japan Peru
0 m1 3 1
1 m2 4 2
but if get error (because duplicates in pair method
, country
columns):
ValueError: Index contains duplicate entries, cannot reshape
solution with groupby
and some aggregate function like mean
(sum
, ...)
X = pd.DataFrame({'country':['Peru','Peru','Peru','Japan'],
'method':['m1','m2','m1','m2'],
'value':[1,2,3,4]})
print (X)
country method value
0 Peru m1 1
1 Peru m2 2
2 Peru m1 3 <-duplicates Peru, m1
3 Japan m2 4
print (X.groupby(['method','country'])['value'].mean()
.unstack(fill_value=0)
.rename_axis(None, axis=1)
.reset_index())
method Japan Peru
0 m1 0 2
1 m2 4 2
Or pivot_table
with default aggfunc=np.mean
:
print (X.pivot_table(index='method',
columns='country',
values='value',
fill_value=0,
aggfunc=np.mean).
rename_axis(None, axis=1).
reset_index())
method Japan Peru
0 m1 0 2
1 m2 4 2
Upvotes: 0
Reputation: 2006
You will need to apply .pivot
to X
follow by .reset_index
I have also remove the name of the columns for cleaner output.
df = X.pivot(index='method',columns='country',values='value').reset_index()
df.columns.name = ''
print(df)
Output:
method Japan Peru
0 m1 3 1
1 m2 4 2
Upvotes: 1