Reputation: 15
I have a data frame that is looking like this (DATA is the year and month of the order) :
CUSTOMER_ID | NAME | DATA | COFFEE_SOLD(KG) | WATER_SOLD(L) |
---|---|---|---|---|
10000 | ALEX | 2022 - 01 | 3 | 4 |
10000 | ALEX | 2022 - 01 | 5 | 6 |
10000 | ALEX | 2022 - 02 | 7 | 8 |
10001 | JOE | 2022 - 02 | 1 | 1 |
10001 | JOE | 2022 - 03 | 1 | 0 |
I pivoted the df with :
df_rap = df_rap.pivot_table(index=["CUSTOMER_ID",'NAME',],columns=["DATA"], values=['COFFEE_SOLD(KG)','WATER_SOLD(L)'], aggfunc='sum').reset_index()
The result :
CUSTOMER_ID | NAME | COFFEE_SOLD(KG) | COFFEE_SOLD(KG) | COFFEE_SOLD(KG) | WATER_SOLD(L) | WATER_SOLD(L) | WATER_SOLD(L) | |
---|---|---|---|---|---|---|---|---|
DATA | 2022 - 01 | 2022 - 02 | 2022 - 03 | 2022 - 01 | 2022 - 02 | 2022 - 03 | ||
0 | 10000 | ALEX | 8 | 7 | 0 | 10 | 8 | 0 |
1 | 10001 | JOE | 0 | 1 | 1 | 0 | 1 | 0 |
The format is ok but I want to export it to excel. For that I need the data frame to look like this :
COFFEE_SOLD(KG) | COFFEE_SOLD(KG) | COFFEE_SOLD(KG) | WATER_SOLD(L) | WATER_SOLD(L) | WATER_SOLD(L) | |||
---|---|---|---|---|---|---|---|---|
DATA | CUSTOMER_ID | NAME | 2022 - 01 | 2022 - 02 | 2022 - 03 | 2022 - 01 | 2022 - 02 | 2022 - 03 |
0 | 10000 | ALEX | 8 | 7 | 0 | 10 | 8 | 0 |
1 | 10001 | JOE | 0 | 1 | 1 | 0 | 1 | 0 |
In other words, i would like to lower the level of the first 2 column ( in header ), to save it in excel properly.
I tried :
df.reset_index()
And it dosen't work.
EDIT :
With :
display( df_copy.columns)
I saw the format of the columns :
MultiIndex([('CUSTOMER_ID', ''),
('NAME', ''),
('COFFEE_SOLD(KG)', '2022 - 01'),
('COFFEE_SOLD(KG)', '2022 - 02'),
('COFFEE_SOLD(KG)', '2022 - 03'),
('WATER_SOLD(L)', '2022 - 01'),
('WATER_SOLD(L)', '2022 - 02'),
('WATER_SOLD(L)', '2022 - 03'),],
names=[None, 'DATA'])
I expected to be :
MultiIndex([('', 'CUSTOMER_ID'),
('', 'NAME'),
('COFFEE_SOLD(KG)', '2022 - 01'),
('COFFEE_SOLD(KG)', '2022 - 02'),
('COFFEE_SOLD(KG)', '2022 - 03'),
('WATER_SOLD(L)', '2022 - 01'),
('WATER_SOLD(L)', '2022 - 02'),
('WATER_SOLD(L)', '2022 - 03'),],
names=[None, 'DATA'])
Thank you !
Upvotes: 0
Views: 81
Reputation: 2136
A possible approach is to overwrite the column values:
cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:]:
cols.append(t)
df_rap.columns = pd.MultiIndex.from_tuples(cols)
This leads to a data frame without the word DATA
in it. Which somehow makes sense, as DATA
has lost some of it's meaning - now being just the name of the index column. If you nevertheless need to keep DATA
, you could create a new column with the corresponding values, rename all columns and move the DATA
column to the front (and save the data frame without the new index column) :
# create a new DATA column
df_rap['DATA'] = df_rap.index
# set new values for the column headers (this time including the new DATA column)
cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:-1]:
cols.append(t)
cols.append(('', 'DATA'))
df_rap.columns = pd.MultiIndex.from_tuples(cols)
# reorder columns ('DATA' to the front)
cols = cols[-1:] + cols[:-1]
df_rap = df_rap[cols]
Upvotes: 1