Reputation: 489
Im working with a df that looks like this:
Year Age Income
1945 1 234
1945 1 65
1945 2 234
...
1946 1 34
...
2015 5 34
I'd like to transform it to something like this, where income values fill the df and age is the index.
Age 1955 1956 1957 ... 2015
0
1
2
3
...
99
Upvotes: 1
Views: 1154
Reputation: 294546
If your ['Year', 'Age']
combinations were unique, you'd just need set_index
and unstack
df.set_index(['Year', 'Age']).Income.unstack('Year')
But they are not!
So, you have to decide how to aggregate the groups of combinations.
Below is an example of using sum
df.groupby(['Year', 'Age']).Income.sum().unstack('Year', fill_value=0)
Year 1945 1946 2015
Age
1 299 34 0
2 234 0 0
5 0 0 34
other aggregations
df.groupby(['Year', 'Age']).Income.first().unstack('Year', fill_value=0)
df.groupby(['Year', 'Age']).Income.last().unstack('Year', fill_value=0)
df.groupby(['Year', 'Age']).Income.mean().unstack('Year', fill_value=0)
pivot_table
pivot_table
<-> pivot
is analogous to set_index
<-> groupby
Meaning, pivot_table
handles aggregation.
df.pivot_table('Income', 'Age', 'Year', 'sum', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'first', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'last', fill_value=0)
df.pivot_table('Income', 'Age', 'Year', 'mean', fill_value=0)
Upvotes: 3
Reputation: 2128
This looks like you'd like to pivot
your table. Pandas documentation is quite good.
import pandas as pd
df = pd.DataFrame([[1945, 1, 234],[1946, 1, 65],[1945, 2, 234]])
df.columns = ['year', 'age', 'income']
df.pivot(index='age', columns='year', values='income')
Which looks like:
year 1945 1946
age
1 234 65
2 234 None
Upvotes: 0