Reputation: 45
What would be the best way to approach this problem using pandas and python?
I currently have a pandas data-frame in a relatively awkward format, for example:
Country Indicator 2000 2010
0 Afghanistan foo 1 2.5
1 Afghanistan bar 3 4.5
2 Afghanistan zoo 5 6.5
3 Bolivia foo 7 8.5
4 Bolivia bar 9 10.5
5 Bolivia zoo 11 12.5
6 Cameroon foo 2 1.5
7 Cameroon bar 4 3.5
8 Cameroon zoo 6 5.5
9 Denmark foo 8 7.5
10 Denmark bar 10 9.5
11 Denmark zoo 12 11.5
Say that I want to split this into two separate data-frames for each respective year.
For 2000:
foo bar zoo
0 1 3 5
1 7 9 11
2 2 4 6
3 8 10 12
For 2010:
foo bar zoo
0 2.5 4.5 6.5
1 8.5 10.5 12.5
2 1.5 3.5 5.5
3 7.5 9.5 11.5
What might be the most efficient way to carry this out in pandas?
Thanks in advance.
Upvotes: 3
Views: 65
Reputation: 93
Using pivot-table allows you to save a sequence of labels: foo, bar, zoo. For 2000 year:
df2000 = df.pivot_table(index='Country', columns='Indicator', sort=False)['2000']
print(df2000)
Indicator foo bar zoo
Country
Afghanistan 1 3 5
Bolivia 7 9 11
Cameroon 2 4 6
Denmark 8 10 12
Upvotes: 0
Reputation: 2059
This transformation is called "pivoting", or sometimes "casting" or "unmelting". It's so common that it's covered by specific functions in the api.:
df_years = df.pivot(index='Country', columns='Indicator', values=['2000', '2010'])
2000 2010
Indicator bar foo zoo bar foo zoo
Country
Afghanistan 3.0 1.0 5.0 4.5 2.5 6.5
Bolivia 9.0 7.0 11.0 10.5 8.5 12.5
...
This results in multi level columns:
df_years['2000']
Indicator bar foo zoo
Country
Afghanistan 3.0 1.0 5.0
Bolivia 9.0 7.0 11.0
...
df_years['2010']
Indicator bar foo zoo
Country
Afghanistan 4.5 2.5 6.5
Bolivia 10.5 8.5 12.5
...
You should just work with these, but if you want flat dataframes, you can assign like this:
df_2000 = df_years['2000']
Indicator bar foo zoo
Country
Afghanistan 3.0 1.0 5.0
Bolivia 9.0 7.0 11.0
...
Upvotes: 4