Aureon
Aureon

Reputation: 141

Pandas pivot table to dataframe

I have a pivot table (pt) that looks like this:

+---------+------------+-------+----+
|         | ZY         | z     |  y |  
+---------+------------+-------+----+
| period_s| ZONE       |       |    | 
+---------+------------+-------+----+
| 201901  | A          | 14    | 34 |
|         | B          | 232   | 9  |
|         | C          | 12    | 2  |
+---------+------------+-------+----+
| 201902  | A          | 196   | 70 |
|         | K          | 10    | 1  |
|         | D          | 313   | 99 |
+---------+------------+-------+----+

which came from a dataframe (df) using the following code:

pt=df.pivot_table(index=['period_s','ZONE'], columns='ZY', values='ID', aggfunc="count")

where the ZY field has two classes z and y.

I tried using the

df = table.reset_index()

also

df.columns = df.columns.droplevel(0) #remove amount
df.columns.name = None               #remove categories
df = df.reset_index()

As mentioned here transform pandas pivot table to regular dataframe and like this one Convert pivot tables to dataframe

I want to have a dataframe like this:

+---------+-------+------------+----------+
| period_s| ZONE  |    z       | y        |
+---------+-------+------------+----------+
|  201901 |     A | 14         |       34 |
|  201901 |     B | 232        |        9 |
|  201901 |     C | 12         |        2 |
|  201902 |     A | 196        |       70 |
|  201902 |     K | 10         |        1 |
|  201902 |     D | 313        |       99 |
+---------+-------+------------+----------+

Upvotes: 1

Views: 2735

Answers (1)

cottontail
cottontail

Reputation: 23041

It's a bit late but I think getting rid of pt.columns.name (i.e. "ZY") and resetting index would return the expected output. A method chain (set_axis() or rename_axis() to get rid of columns.name and reset_index() to convert period_s and ZONE into columns).

pt.set_axis(pt.columns.tolist(), axis=1).reset_index()
#pt.rename_axis(None, axis=1).reset_index()

A more straightforward way is to reset_index() and remove columns.name explicitly.

pt.reset_index(inplace=True)
pt.columns.name = None

A reproducible example:

import numpy as np
df = pd.DataFrame({'period_s': np.random.choice([201901, 201902], size=100),
                   'ZONE': np.random.choice([*'ABC'], size=100),
                   'ZY': np.random.choice([*'zy'], size=100),
                   'ID': np.arange(100)})
pt = df.pivot_table(index=['period_s','ZONE'], columns='ZY', values='ID', aggfunc="count")

# output
pt.set_axis(pt.columns.tolist(), axis=1).reset_index()

result

Upvotes: 1

Related Questions