Changing level of some columns in multi index

Question

I have a data frame that is looking like this (DATA is the year and month of the order) :

CUSTOMER_ID	NAME	DATA	COFFEE_SOLD(KG)	WATER_SOLD(L)
10000	ALEX	2022 - 01	3	4
10000	ALEX	2022 - 01	5	6
10000	ALEX	2022 - 02	7	8
10001	JOE	2022 - 02	1	1
10001	JOE	2022 - 03	1	0

I pivoted the df with :

df_rap = df_rap.pivot_table(index=["CUSTOMER_ID",'NAME',],columns=["DATA"], values=['COFFEE_SOLD(KG)','WATER_SOLD(L)'], aggfunc='sum').reset_index()

The result :

	CUSTOMER_ID	NAME	COFFEE_SOLD(KG)	COFFEE_SOLD(KG)	COFFEE_SOLD(KG)	WATER_SOLD(L)	WATER_SOLD(L)	WATER_SOLD(L)
DATA			2022 - 01	2022 - 02	2022 - 03	2022 - 01	2022 - 02	2022 - 03
0	10000	ALEX	8	7	0	10	8	0
1	10001	JOE	0	1	1	0	1	0

The format is ok but I want to export it to excel. For that I need the data frame to look like this :

			COFFEE_SOLD(KG)	COFFEE_SOLD(KG)	COFFEE_SOLD(KG)	WATER_SOLD(L)	WATER_SOLD(L)	WATER_SOLD(L)
DATA	CUSTOMER_ID	NAME	2022 - 01	2022 - 02	2022 - 03	2022 - 01	2022 - 02	2022 - 03
0	10000	ALEX	8	7	0	10	8	0
1	10001	JOE	0	1	1	0	1	0

In other words, i would like to lower the level of the first 2 column ( in header ), to save it in excel properly.

I tried :

df.reset_index()

And it dosen't work.

EDIT :

With :

display( df_copy.columns)

I saw the format of the columns :

MultiIndex([('CUSTOMER_ID', ''),
            ('NAME',        ''),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

I expected to be :

MultiIndex([('', 'CUSTOMER_ID'),
            ('',        'NAME'),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

Thank you !

rosa b. · Accepted Answer

A possible approach is to overwrite the column values:

cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:]:
    cols.append(t)
    
df_rap.columns = pd.MultiIndex.from_tuples(cols)

This leads to a data frame without the word DATA in it. Which somehow makes sense, as DATA has lost some of it's meaning - now being just the name of the index column. If you nevertheless need to keep DATA, you could create a new column with the corresponding values, rename all columns and move the DATA column to the front (and save the data frame without the new index column) :

# create a new DATA column
df_rap['DATA'] = df_rap.index
# set new values for the column headers (this time including the new DATA column)
cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:-1]:
    cols.append(t)
cols.append(('', 'DATA'))
df_rap.columns = pd.MultiIndex.from_tuples(cols)
# reorder columns ('DATA' to the front)
cols = cols[-1:] + cols[:-1]
df_rap = df_rap[cols]

Changing level of some columns in multi index

Answers (1)

Related Questions