Radu Iordache
Radu Iordache

Reputation: 15

Changing level of some columns in multi index

I have a data frame that is looking like this (DATA is the year and month of the order) :

CUSTOMER_ID NAME DATA COFFEE_SOLD(KG) WATER_SOLD(L)
10000 ALEX 2022 - 01 3 4
10000 ALEX 2022 - 01 5 6
10000 ALEX 2022 - 02 7 8
10001 JOE 2022 - 02 1 1
10001 JOE 2022 - 03 1 0

I pivoted the df with :

df_rap = df_rap.pivot_table(index=["CUSTOMER_ID",'NAME',],columns=["DATA"], values=['COFFEE_SOLD(KG)','WATER_SOLD(L)'], aggfunc='sum').reset_index()

The result :

CUSTOMER_ID NAME COFFEE_SOLD(KG) COFFEE_SOLD(KG) COFFEE_SOLD(KG) WATER_SOLD(L) WATER_SOLD(L) WATER_SOLD(L)
DATA 2022 - 01 2022 - 02 2022 - 03 2022 - 01 2022 - 02 2022 - 03
0 10000 ALEX 8 7 0 10 8 0
1 10001 JOE 0 1 1 0 1 0

The format is ok but I want to export it to excel. For that I need the data frame to look like this :

COFFEE_SOLD(KG) COFFEE_SOLD(KG) COFFEE_SOLD(KG) WATER_SOLD(L) WATER_SOLD(L) WATER_SOLD(L)
DATA CUSTOMER_ID NAME 2022 - 01 2022 - 02 2022 - 03 2022 - 01 2022 - 02 2022 - 03
0 10000 ALEX 8 7 0 10 8 0
1 10001 JOE 0 1 1 0 1 0

In other words, i would like to lower the level of the first 2 column ( in header ), to save it in excel properly.

I tried :

df.reset_index()

And it dosen't work.

EDIT :

With :

display( df_copy.columns)

I saw the format of the columns :

MultiIndex([('CUSTOMER_ID', ''),
            ('NAME',        ''),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

I expected to be :

MultiIndex([('', 'CUSTOMER_ID'),
            ('',        'NAME'),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

Thank you !

Upvotes: 0

Views: 81

Answers (1)

rosa b.
rosa b.

Reputation: 2136

A possible approach is to overwrite the column values:

cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:]:
    cols.append(t)
    
df_rap.columns = pd.MultiIndex.from_tuples(cols)

This leads to a data frame without the word DATA in it. Which somehow makes sense, as DATA has lost some of it's meaning - now being just the name of the index column. If you nevertheless need to keep DATA, you could create a new column with the corresponding values, rename all columns and move the DATA column to the front (and save the data frame without the new index column) :

# create a new DATA column
df_rap['DATA'] = df_rap.index
# set new values for the column headers (this time including the new DATA column)
cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:-1]:
    cols.append(t)
cols.append(('', 'DATA'))
df_rap.columns = pd.MultiIndex.from_tuples(cols)
# reorder columns ('DATA' to the front)
cols = cols[-1:] + cols[:-1]
df_rap = df_rap[cols]

Upvotes: 1

Related Questions