Shane S
Shane S

Reputation: 2293

How to melt and unpivot a multi-header dataframe?

I have this data that I want to unpivot and melt into columns. The data is a multi-header table. I have a sample dictionary of the data.

Edit here___

I don't know how to convert a dictionary with multiple keys like I had shown previously into a df so let's restructure the dictionary like so...

data = {
    "id": {
        0: "month",
        1: "11/30/2021",
        2: "12/31/2021",
        3: "1/31/2022",
        4: "2/28/2022",
        5: "3/31/2022",
    },
    "A48": {0: "storage", 1: "0", 2: "29", 3: "35", 4: "33", 5: "30"},
    "A48.1": {0: "use", 1: "0", 2: "1", 3: "0", 4: "0", 5: "0"},
    "A62": {0: "direct", 1: "0", 2: "0", 3: "2", 4: "3", 5: "2"},
    "A62.1": {0: "storage", 1: "0", 2: "57", 3: "69", 4: "65", 5: "59"},
    "A62.2": {0: "use", 1: "0", 2: "1", 3: "0", 4: "0", 5: "0"},
}

Now let's get the Dataframe...

dfc = pd.DataFrame.from_dict(data)
dfc.columns=pd.MultiIndex.from_arrays([dfc.columns,dfc.iloc[0]])
dfc = dfc.iloc[1:].reset_index(drop=True)

Which looks like this:

           id     A48 A48.1    A62   A62.1 A62.2
        month storage   use direct storage   use
0  11/30/2021       0     0      0       0     0
1  12/31/2021      29     1      0      57     1
2   1/31/2022      35     0      2      69     0
3   2/28/2022      33     0      3      65     0
4   3/31/2022      30     0      2      59     0

What I am looking for is a table like this.

month id direct storage use
11/30/2021 A48 NaN 0 0
12/31/2021 A48 NaN 29 1
1/31/2022 A48 NaN 35 0
2/28/2022 A48 NaN 33 0
3/31/2022 A48 NaN 30 0
11/30/2021 A62 0 0 0
12/31/2021 A62 0 57 1
1/31/2022 A62 2 69 0
2/28/2022 A62 3 65 0
3/31/2022 A62 2 59 0

Upvotes: 2

Views: 444

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

One option is to flatten the columns, then reshape with pivot_longer from pyjanitor, using a regular expression to capture the groups:

# pip install pyjanitor
import pandas as pd
import janitor

# flatten columns:
# an alternative is dfc.collapse_levels()
# from pyjanitor
dfc.columns = dfc.columns.map('_'.join) 

# reshape:

(dfc
.pivot_longer(
    index = 'id_month', 
    names_to = ('id', '.value'), 
    names_pattern = r"([a-zA-Z]\d+)\.?\d?_(.+)")
.rename(columns={'id_month':'month'})
)
        month   id storage use direct
0  11/30/2021  A48       0   0    NaN
1  12/31/2021  A48      29   1    NaN
2   1/31/2022  A48      35   0    NaN
3   2/28/2022  A48      33   0    NaN
4   3/31/2022  A48      30   0    NaN
5  11/30/2021  A62       0   0      0
6  12/31/2021  A62      57   1      0
7   1/31/2022  A62      69   0      2
8   2/28/2022  A62      65   0      3
9   3/31/2022  A62      59   0      2

Another option is to rebuild the MultiIndex column, and then stack:

dfc.columns.names = ['top','bottom']
top = dfc.columns.get_level_values(level='top').str.split('.').str[0]
bottom = dfc.columns.get_level_values('bottom')
cols = pd.MultiIndex.from_arrays([top, bottom], names = ['top','bottom'])
dfc.columns = cols
(dfc
.set_index(('id','month'))
.stack('top')
.rename_axis(index=['month','id'], columns=None)
.reset_index()
)

        month   id direct storage use
0  11/30/2021  A48    NaN       0   0
1  11/30/2021  A62      0       0   0
2  12/31/2021  A48    NaN      29   1
3  12/31/2021  A62      0      57   1
4   1/31/2022  A48    NaN      35   0
5   1/31/2022  A62      2      69   0
6   2/28/2022  A48    NaN      33   0
7   2/28/2022  A62      3      65   0
8   3/31/2022  A48    NaN      30   0
9   3/31/2022  A62      2      59   0

Upvotes: 1

Laurent
Laurent

Reputation: 13458

Define for later use the following helper function:

import pandas as pd

def helper(df):
    return df.pipe(
        lambda df_: df_.rename(columns={"col1": df_["col0"].unique()[0]})
        .drop(columns="col0")
        .reset_index(drop=True)
    )

Then, with Pandas melt, concat and merge methods:

# Setup
n = dfc.shape[0]

# Melt dataframe and cleanup
melted_dfc = dfc.melt()
melted_dfc.columns = ["id", "col0", "col1"]
melted_dfc["id"] = melted_dfc["id"].replace(r"[.]\d+", "", regex=True)

# Get intermediate dataframes
month_df = helper(melted_dfc.loc[: n - 1, :]).drop(columns="id")
sub_dfs = [
    pd.concat([month_df, helper(df)], axis=1)
    for df in [
        melted_dfc.loc[i : i + n - 1, :] for i in range(n, melted_dfc.shape[0], n)
    ]
]

# Merge intermediate dataframes
final_df = sub_dfs[0]
for sub_df in sub_dfs[1:]:
    final_df = pd.merge(
        left=final_df, right=sub_df, how="outer", on=["month", "id"]
    ).fillna(0)

# Cleanup temporary columns created during merge
columns_to_merge = set(
    col[:-2] for col in final_df.columns if col.endswith(("_x", "_y"))
)
for col in columns_to_merge:
    final_df[col] = final_df[f"{col}_x"].astype(int) + final_df[f"{col}_y"].astype(int)
    final_df = final_df.drop(columns=[f"{col}_x", f"{col}_y"])

Finally:

print(final_df)
# Output
        month   id direct  storage  use
0  12/31/2021  A48      0       29    1
1   1/31/2022  A48      0       35    0
2   2/28/2022  A48      0       33    0
3   3/31/2022  A48      0       30    0
4  12/31/2021  A62      0       57    1
5   1/31/2022  A62      2       69    0
6   2/28/2022  A62      3       65    0
7   3/31/2022  A62      2       59    0

Upvotes: 1

Related Questions