python pandas: pivoting columns to rows

Question

I have a table like:

country | name  | medals_won | year
-----------------------------------
US      | sarah |      1     | 2010
US      | sarah |      2     | 2011
US      | sarah |      5     | 2015
US      | alice |      3     | 2010
US      | alice |      4     | 2012
US      | alice |      1     | 2015
AU      | jones |      2     | 2013
AU      | jones |      8     | 2015

I want it like:

country | name  | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US      | sarah | 1    | 2    | 0    | 0    | 0    | 5
US      | alice | 3    | 0    | 4    | 0    | 0    | 1
AU      | jones | 0    | 0    | 0    | 2    | 0    | 8

I've tinkered with df.apply, or even brute-force iteration, but you can probably guess that the tricky part is that these row values aren't strictly sequential, so this isn't a simple transpose operation (nobody won any medals in 2014, for e.g., but I want the resulting table to show that in a column full of zeros).

jezrael · Accepted Answer

You can use set_index + unstack:

df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones     0     0     0     2     8
US      alice     3     0     4     0     1
        sarah     1     2     0     0     5

If duplicates need aggregation like mean, sum... with pivot_table or groupby + aggregate function + unstack:

print (df)
  country   name  medals_won  year
0      US  sarah           1  2010 <-same US  sarah 2010, different 1
1      US  sarah           4  2010 <-same US  sarah 2010, different 4
2      US  sarah           2  2011
3      US  sarah           5  2015
4      US  alice           3  2010
5      US  alice           4  2012
6      US  alice           1  2015
7      AU  jones           2  2013
8      AU  jones           8  2015

df = df.pivot_table(index=['country','name'], 
                    columns='year', 
                    values='medals_won', 
                    fill_value=0, 
                    aggfunc='mean')
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0     0     0     2     8
US      alice   3.0     0     4     0     1
        sarah   2.5     2     0     0     5 <- (1+4)/2 = 2.5

Alternatively:

df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0   0.0   0.0   2.0   8.0
US      alice   3.0   0.0   4.0   0.0   1.0
        sarah   2.5   2.0   0.0   0.0   5.0

Last:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  country   name  2010  2011  2012  2013  2015
0      AU  jones     0     0     0     2     8
1      US  alice     3     0     4     0     1
2      US  sarah     1     2     0     0     5

python pandas: pivoting columns to rows

Answers (2)

Related Questions