Reputation: 730
First off am I the only one who is incredibly frustrated by pivot_table creating extra levels of index/column names? I've never used pivot_tables and wanted anything other than a flat dataframe type, but every time I use it I'm forced to reset index, figure out column names, etc.
Here's an example, I have a dataframe that looks like this:
geography variable_type t0 t1 t2 t3
geo1 var1 1 3 4 8
geo1 var2 8 9 3 1
geo2 var1 4 7 9 4
geo2 var2 1 8 3 5
What I'm trying to accomplish is this:
geography time var1 var2
geo1 0 1 8
geo1 1 3 9
geo1 2 4 3
geo1 3 8 1
geo2 0 4 1
geo2 1 7 8
geo2 2 9 3
geo2 3 4 5
My code so far:
pd.melt(df,id_vars=['geography','variable_type']).pivot_table(index=['geography','variable'],columns='variable_type')
That gives me something where geography and variable look to be indices, and the columns appear to be the different variables but they seem to have a parent level of "value" and name of "variable_type". If I add a reset_index() like this:
pd.melt(df,id_vars=['geography','variable_type']).pivot_table(index=['geography','variable'],columns='variable_type').reset_index()
then I end up with the variables still having a name of "variable_type" and a parent level of "value", while the index is resolved and is 0-n.
So my questions are:
Upvotes: 0
Views: 754
Reputation: 25249
Add values
option to pivot_table
and chain rename_axis
df_final = (pd.melt(df,id_vars=['geography','variable_type'])
.pivot_table(index=['geography','variable'],columns='variable_type',values='value')
.reset_index().rename_axis(None,axis=1))
Out[75]:
geography variable var1 var2
0 geo1 t0 1 8
1 geo1 t1 3 9
2 geo1 t2 4 3
3 geo1 t3 8 1
4 geo2 t0 4 1
5 geo2 t1 7 8
6 geo2 t2 9 3
7 geo2 t3 4 5
Upvotes: 1