zachvac
zachvac

Reputation: 730

Pandas pivot_table and extra index levels

First off am I the only one who is incredibly frustrated by pivot_table creating extra levels of index/column names? I've never used pivot_tables and wanted anything other than a flat dataframe type, but every time I use it I'm forced to reset index, figure out column names, etc.

Here's an example, I have a dataframe that looks like this:

geography variable_type t0 t1 t2 t3
geo1      var1          1  3  4  8
geo1      var2          8  9  3  1
geo2      var1          4  7  9  4
geo2      var2          1  8  3  5

What I'm trying to accomplish is this:

geography time var1 var2
geo1      0    1    8
geo1      1    3    9
geo1      2    4    3
geo1      3    8    1
geo2      0    4    1
geo2      1    7    8
geo2      2    9    3
geo2      3    4    5

My code so far:

pd.melt(df,id_vars=['geography','variable_type']).pivot_table(index=['geography','variable'],columns='variable_type')

That gives me something where geography and variable look to be indices, and the columns appear to be the different variables but they seem to have a parent level of "value" and name of "variable_type". If I add a reset_index() like this:

pd.melt(df,id_vars=['geography','variable_type']).pivot_table(index=['geography','variable'],columns='variable_type').reset_index()

then I end up with the variables still having a name of "variable_type" and a parent level of "value", while the index is resolved and is 0-n.

So my questions are:

  1. How do I fix this in the here and now? I want a flat dataframe that looks like a csv or database table or any other normal dataframe.
  2. Is there another way to do this without using pivot_table, or maybe using pivot_table without introducing all these new index levels/names that I've never wanted ever in a billion years?

Upvotes: 0

Views: 754

Answers (1)

Andy L.
Andy L.

Reputation: 25249

Add values option to pivot_table and chain rename_axis

df_final =  (pd.melt(df,id_vars=['geography','variable_type'])
               .pivot_table(index=['geography','variable'],columns='variable_type',values='value')
               .reset_index().rename_axis(None,axis=1))

Out[75]:
  geography variable  var1  var2
0      geo1       t0     1     8
1      geo1       t1     3     9
2      geo1       t2     4     3
3      geo1       t3     8     1
4      geo2       t0     4     1
5      geo2       t1     7     8
6      geo2       t2     9     3
7      geo2       t3     4     5

Upvotes: 1

Related Questions