user3635284
user3635284

Reputation: 513

Pandas Pivot with extra column

I have a simple question about how to pivot a Pandas Dataframe with the extra problem of having an additional column.

The dataset looks like this one:

X = pd.DataFrame({'country':['Peru','Peru','Japan','Japan'],'method':['m1','m2','m1','m2'], 'value':[1,2,3,4]})

Country   |   Method    |   Value
Peru      |   m1        |   1
Peru      |   m2        |   2
Japan     |   m1        |   3
Japan     |   m2        |   4

All the "Countries" have values for all the "Methods" I would like to pivot this dataframe with each Country as a column but I need to carry on the method:

Peru |  Japan | Method
1    |  3     | m1
2    |  4     | m4

Thanks for the help!

Upvotes: 0

Views: 1158

Answers (2)

jezrael
jezrael

Reputation: 863281

Solution with set_index and unstack:

print (X.set_index(['method','country'])['value']
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

but if get error (because duplicates in pair method, country columns):

ValueError: Index contains duplicate entries, cannot reshape

solution with groupby and some aggregate function like mean (sum, ...)

X = pd.DataFrame({'country':['Peru','Peru','Peru','Japan'],
                  'method':['m1','m2','m1','m2'], 
                  'value':[1,2,3,4]})
print (X)
  country method  value
0    Peru     m1      1
1    Peru     m2      2
2    Peru     m1      3 <-duplicates Peru, m1
3   Japan     m2      4

print (X.groupby(['method','country'])['value'].mean()
        .unstack(fill_value=0)
        .rename_axis(None, axis=1)
        .reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

Or pivot_table with default aggfunc=np.mean:

print (X.pivot_table(index='method', 
                     columns='country', 
                     values='value', 
                     fill_value=0, 
                     aggfunc=np.mean).
                     rename_axis(None, axis=1).
                     reset_index())

  method  Japan  Peru
0     m1      0     2
1     m2      4     2

Upvotes: 0

Alex Fung
Alex Fung

Reputation: 2006

You will need to apply .pivot to X follow by .reset_index

I have also remove the name of the columns for cleaner output.

df = X.pivot(index='method',columns='country',values='value').reset_index() 
df.columns.name = ''
print(df)

Output:

  method  Japan  Peru
0     m1      3     1
1     m2      4     2

Upvotes: 1

Related Questions