Reshape a column into multiple columns

Question

Im working with a df that looks like this:

Year Age Income
1945  1   234
1945  1   65
1945  2   234
...
1946  1   34
...
2015  5   34

I'd like to transform it to something like this, where income values fill the df and age is the index.

Age 1955 1956 1957 ... 2015
 0   
 1
 2 
 3
...
99

piRSquared · Accepted Answer

If your ['Year', 'Age'] combinations were unique, you'd just need set_index and unstack

df.set_index(['Year', 'Age']).Income.unstack('Year')

But they are not!
So, you have to decide how to aggregate the groups of combinations.
Below is an example of using sum

df.groupby(['Year', 'Age']).Income.sum().unstack('Year', fill_value=0)

Year  1945  1946  2015
Age                   
1      299    34     0
2      234     0     0
5        0     0    34

other aggregations

df.groupby(['Year', 'Age']).Income.first().unstack('Year', fill_value=0)
df.groupby(['Year', 'Age']).Income.last().unstack('Year', fill_value=0)
df.groupby(['Year', 'Age']).Income.mean().unstack('Year', fill_value=0)

pivot_table
pivot_table <-> pivot is analogous to set_index <-> groupby
Meaning, pivot_table handles aggregation.

df.pivot_table('Income', 'Age', 'Year', 'sum', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'first', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'last', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'mean', fill_value=0)

Reshape a column into multiple columns

Answers (2)

Related Questions