Reputation: 43
i have been trying to reshape a given pandas dataframe for two days. I would like to transform my multi-index dataframe into multi-column form, but failed greatly by using pd.stack(), pd.unstack(), pd.melt(), ...
I have a generic multi-index dataframe, given as:
import pandas
df = pandas.DataFrame({'Scenario' : ['Scen1', 'Scen1', 'Scen1', 'Scen1',
'Scen1', 'Scen1', 'Scen2','Scen2',
'Scen2', 'Scen2','Scen2','Scen2'],
'Tech' : ['x', 'y', 'z', 'x', 'y', 'z',
'x', 'y', 'z', 'x', 'y', 'z'],
'Year' : ['2010', '2010', '2010',
'2015', '2015', '2015',
'2010', '2010', '2010',
'2015', '2015', '2015'],
'Sum' : ['1', '2', '3', '4',
'5', '6', '7', '8',
'9', '10', '11', '12']})
df.set_index(['Scenario', 'Tech'], inplace=True)
print(df)
Sum Year
Scenario Tech
Scen1 x 1 2010
y 2 2010
z 3 2010
x 4 2015
y 5 2015
z 6 2015
Scen2 x 7 2010
y 8 2010
z 9 2010
x 10 2015
y 11 2015
z 12 2015
However, I would like to convert it into the following form:
2010 2015
Tech Scen1 Scen2 Scen1 Scen2
x ... ... ... ...
y ... ... ... ...
z ... ... ... ...
Iam using python 3.7 for Windows 64bit with the pandas version 0.23.4 and would be very grateful for any kind of tips and possible solutions that will help me to get ahead with my problems.
Upvotes: 4
Views: 2525
Reputation: 38415
There are many good answers on melt and pivot on SO. In your sample df, sum column is of string type. Convert it to int and use pivot_table. Key difference between pivot and pivot_table is that when your index contains duplicated entries, you need to use pivot_table with some aggregate function. If you don't pass any function, the default is mean.
df['Sum'] = df['Sum'].astype(int)
df.pivot_table(index = 'Tech', columns = ['Year', 'Scenario'], values = 'Sum')
Year 2010 2015
Scenario Scen1 Scen2 Scen1 Scen2
Tech
x 1 7 4 10
y 2 8 5 11
z 3 9 6 12
Note: The same can be done using groupby. Since you need columns at two levels, you need to unstack twice.
df.groupby(['Tech','Scenario','Year'])['Sum'].mean().unstack().unstack()
Upvotes: 4