AndreyIto
AndreyIto

Reputation: 974

python pandas: pivoting columns to rows

I have a table like:

country | name  | medals_won | year
-----------------------------------
US      | sarah |      1     | 2010
US      | sarah |      2     | 2011
US      | sarah |      5     | 2015
US      | alice |      3     | 2010
US      | alice |      4     | 2012
US      | alice |      1     | 2015
AU      | jones |      2     | 2013
AU      | jones |      8     | 2015

I want it like:

country | name  | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US      | sarah | 1    | 2    | 0    | 0    | 0    | 5
US      | alice | 3    | 0    | 4    | 0    | 0    | 1
AU      | jones | 0    | 0    | 0    | 2    | 0    | 8

I've tinkered with df.apply, or even brute-force iteration, but you can probably guess that the tricky part is that these row values aren't strictly sequential, so this isn't a simple transpose operation (nobody won any medals in 2014, for e.g., but I want the resulting table to show that in a column full of zeros).

Upvotes: 3

Views: 2615

Answers (2)

Ali Arsalan
Ali Arsalan

Reputation: 39

You can use the pivot_table() function of pandas and fill nan values with zero using pd.fillna(0)

    df = pd.DataFrame({
        'country' : pd.Series(['US', 'US', 'US', 'US', 'US', 'US', 'AU', 'AU']),
        'name' : pd.Series(['sarah', 'sarah','sarah','alice','alice','alice','jones','jones']),
        'medals_won' : pd.Series([1,2,5,3,4,1,2,8]),
        'year': pd.Series([2010,2011,2015,2010,2012,2015,2013,2015])    
        })
    pd.pivot_table(df, index=['country','name'], columns='year', aggfunc='sum').fillna(0)

my output

Upvotes: -1

jezrael
jezrael

Reputation: 862601

You can use set_index + unstack:

df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones     0     0     0     2     8
US      alice     3     0     4     0     1
        sarah     1     2     0     0     5

If duplicates need aggregation like mean, sum... with pivot_table or groupby + aggregate function + unstack:

print (df)
  country   name  medals_won  year
0      US  sarah           1  2010 <-same US  sarah 2010, different 1
1      US  sarah           4  2010 <-same US  sarah 2010, different 4
2      US  sarah           2  2011
3      US  sarah           5  2015
4      US  alice           3  2010
5      US  alice           4  2012
6      US  alice           1  2015
7      AU  jones           2  2013
8      AU  jones           8  2015

df = df.pivot_table(index=['country','name'], 
                    columns='year', 
                    values='medals_won', 
                    fill_value=0, 
                    aggfunc='mean')
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0     0     0     2     8
US      alice   3.0     0     4     0     1
        sarah   2.5     2     0     0     5 <- (1+4)/2 = 2.5

Alternatively:

df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0   0.0   0.0   2.0   8.0
US      alice   3.0   0.0   4.0   0.0   1.0
        sarah   2.5   2.0   0.0   0.0   5.0

Last:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  country   name  2010  2011  2012  2013  2015
0      AU  jones     0     0     0     2     8
1      US  alice     3     0     4     0     1
2      US  sarah     1     2     0     0     5

Upvotes: 6

Related Questions