Forecaster
Forecaster

Reputation: 33

How to melt/stack multiindex headers with pandas?

I have an Excel file containing credit spread calculations for different credit ratings. The structure includes merged cells which create multi-level column headers. Here is an example of the structure:

USD All Sectors A  USD All Sectors A  USD All Sectors A
Level  Date        1D                7D                14D
0      05/30/2023  66.42             66.42             66.42
0      05/31/2023  65.95             65.95             65.95

I need to transform this data into a long format using pandas so that it looks like this:

DATO        INDEKS            TENOR  NIVEAU
05/30/2023  USD All Sectors A  1D     66.42
05/31/2023  USD All Sectors A  1D     65.95
05/30/2023  USD All Sectors A  7D     66.42
05/31/2023  USD All Sectors A  7D     65.95
05/30/2023  USD All Sectors A  14D    66.42
05/31/2023  USD All Sectors A  14D    65.95
import pandas as pd

Load the data from the Excel file
data = pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)

df_stacked = df.stack(level=1).reset_index()
print(df_stacked.head(10))

Which gives the following output (a small part of the output):

  level_0 level_1  USD All Sectors A  USD All Sectors B  USD All Sectors BB  \
0        0    108M           129.6630           585.9322            335.4587   
1        0    120M           133.1738           585.9688            335.4797   
2        0     12M            66.4248           581.6899            333.0146   

However, I am having trouble getting the multi-index headers to work correctly, and the reshaped data is not in the desired format. The main issue seems to be handling the correct multiindex headers.

Is the stack function the correct function to use here? Or should i go with melt or pivot instead?

example of excel input attached.Excel Input

Upvotes: 1

Views: 101

Answers (2)

Soudipta Dutta
Soudipta Dutta

Reputation: 2152

import pandas as pd

# Sample data
data = {
    'Level': [0, 0],
    'Date': ['05/30/2023', '05/31/2023'],
    'USD All Sectors A 1D': [66.42, 65.95],
    'USD All Sectors A 7D': [66.42, 65.95],
    'USD All Sectors A 14D': [66.42, 65.95]
}

df = pd.DataFrame(data)
print(df.to_string())
"""
  Level        Date  USD All Sectors A 1D  USD All Sectors A 7D  USD All Sectors A 14D
0      0  05/30/2023                 66.42                 66.42                  66.42
1      0  05/31/2023                 65.95                 65.95                  65.95
     
"""
# Melt the DataFrame
df_melted = df.melt(id_vars=['Date'], 
                    value_vars=['USD All Sectors A 1D', 'USD All Sectors A 7D', 'USD All Sectors A 14D'], 
                    var_name='Sector_Days', 
                    value_name='VALUE')


"""
         Date            Sector_Days  VALUE
0  05/30/2023   USD All Sectors A 1D  66.42
1  05/31/2023   USD All Sectors A 1D  65.95
2  05/30/2023   USD All Sectors A 7D  66.42
3  05/31/2023   USD All Sectors A 7D  65.95
4  05/30/2023  USD All Sectors A 14D  66.42
5  05/31/2023  USD All Sectors A 14D  65.95
"""
# Split the 'Sector_Days' into 'SECTOR' and 'DAYS'
df_melted[['SECTOR', 'DAYS']] = df_melted['Sector_Days'].str.rsplit(' ', n=1, expand=True)


"""
           Date            Sector_Days  VALUE             SECTOR DAYS
0  05/30/2023   USD All Sectors A 1D  66.42  USD All Sectors A   1D
1  05/31/2023   USD All Sectors A 1D  65.95  USD All Sectors A   1D
2  05/30/2023   USD All Sectors A 7D  66.42  USD All Sectors A   7D
3  05/31/2023   USD All Sectors A 7D  65.95  USD All Sectors A   7D
4  05/30/2023  USD All Sectors A 14D  66.42  USD All Sectors A  14D
5  05/31/2023  USD All Sectors A 14D  65.95  USD All Sectors A  14D

"""

# Rename and reorder the columns
df_melted = df_melted.rename(columns={'Date': 'DATE'}).drop(columns='Sector_Days')

"""
         DATE  VALUE             SECTOR DAYS
0  05/30/2023  66.42  USD All Sectors A   1D
1  05/31/2023  65.95  USD All Sectors A   1D
2  05/30/2023  66.42  USD All Sectors A   7D
3  05/31/2023  65.95  USD All Sectors A   7D
4  05/30/2023  66.42  USD All Sectors A  14D
5  05/31/2023  65.95  USD All Sectors A  14D
"""


df_melted = df_melted[['DATE', 'SECTOR', 'DAYS', 'VALUE']]

print(df_melted)
"""
        DATE             SECTOR DAYS  VALUE
0  05/30/2023  USD All Sectors A   1D  66.42
1  05/31/2023  USD All Sectors A   1D  65.95
2  05/30/2023  USD All Sectors A   7D  66.42
3  05/31/2023  USD All Sectors A   7D  65.95
4  05/30/2023  USD All Sectors A  14D  66.42
5  05/31/2023  USD All Sectors A  14D  65.95

"""

Upvotes: 1

mozway
mozway

Reputation: 262254

If I understand correctly, you should set the first two columns as index while reading your file:

df = (pd.read_excel('credit_spreads.xlsx', skiprows=12,
                    header=[0, 1], index_col=[0, 1])
        .droplevel(0)
        .rename_axis(index='DATO', columns=['INDEKS', 'TENOR'])
        .stack([0, 1]).reset_index(name='NIVEAU')
     )

Output:

        DATO             INDEKS TENOR  NIVEAU
0 2023-05-30  USD All Sectors A    1D   66.42
1 2023-05-30  USD All Sectors A    7D   66.42
2 2023-05-30  USD All Sectors A   14D   66.42
3 2023-05-31  USD All Sectors A    1D   65.95
4 2023-05-31  USD All Sectors A    7D   65.95
5 2023-05-31  USD All Sectors A   14D   65.95

Alternatively, with melt:

df = (pd.read_excel('credit_spreads.xlsx', header=[0, 1], skiprows=12)
        .melt([('Unnamed: 0_level_0', 'Level'),
               ('Unnamed: 1_level_0', 'Date')], value_name='NIVEAU')
        .drop(columns=[('Unnamed: 0_level_0', 'Level')])
        .rename(columns={'variable_0': 'INDEKS', 'variable_1': 'TENOR',
                         ('Unnamed: 1_level_0', 'Date'): 'DATO'})
     )

Output:

        DATO             INDEKS TENOR  NIVEAU
0 2023-05-30  USD All Sectors A    1D   66.42
1 2023-05-31  USD All Sectors A    1D   65.95
2 2023-05-30  USD All Sectors A    7D   66.42
3 2023-05-31  USD All Sectors A    7D   65.95
4 2023-05-30  USD All Sectors A   14D   66.42
5 2023-05-31  USD All Sectors A   14D   65.95

Upvotes: 2

Related Questions