Reputation: 95
I have a dataframe which has a datetime column called Month and two other columns:
data = [['Canada',10, '2020-09-01'], ['Canada',20, '2020-10-01'], ['Canada',30, '2020-12-01'], ['Canada',40, '2021-01-01'],
['Europe',30, '2020-09-01'], ['Europe',20, '2020-10-01'], ['Europe',10, '2020-12-01'], ['Europe',40, '2021-01-01'],
['US',40, '2020-09-01'], ['US',10, '2020-10-01'], ['US',20, '2020-12-01'], ['US',30, '2021-01-01']]
df = pd.DataFrame(data,columns=['Region','sales', 'Month'])
Next, I convert the 'Month' column to a string with a specific format:
df['Month'] = df['Month'].dt.strftime('%b-%Y')
Now, I pivot the dataframe and export to excel:
df['Month'] = pd.pivot_table(df['Month'], values = 'sales', index=["Region"], columns = "Month").reset_index()
df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)
Since the 'Month' column is a string, when I write the dataframe to excel, the dates are sorted alphabetically. I want the dates to be sorted by datetime values.
I tried to convert the 'Month' column to datetime before pivoting, but in that case I don't get the dates in the right format after exporting to excel:
df['Month'] = pd.to_datetime(df['Month'], format='%b-%Y')
I even tried to use ExcelWriter format but that doesn't seem to work as well.
df['Month'] = pd.to_datetime(df['Month'])
df = pd.pivot_table(df, values = 'sales', index=["Region"], columns = "Month").reset_index()
df = df.append(pd.Series(df.sum(),name='System'))\
.assign(Total=df.sum(1))
# extract the datetime component of the multilevel column names
dates = [v for v in df.columns[1:]]
# reformat dates to the desired string format
dates_str = [v.strftime('%b-%Y') for v in dates]
# create a dict
updates = dict(zip(dates, dates_str))
# rename the columns, which will stay in the current, correct order
df = df.rename(columns=updates, inplace=True)
df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)
Upvotes: 1
Views: 441
Reputation: 62383
'Month'
column dates are a datetime
dtype
from pd.to_datetime
.
df.Month = pd.to_datetime(df.Month)
, before pivoting.'Months'
to a string, and then rename the columns.
.rename
is used, because assigning column names by list slicing and assignment results in a TypeError
(e.g. df.columns[:1] = 3
)import pandas as pd
# sample dataframe
data = {'Region': ['Canada', 'Canada', 'Canada', 'Canada', 'Europe', 'Europe', 'Europe', 'Europe', 'US', 'US', 'US', 'US'],
'sales': [10, 20, 30, 40, 30, 20, 10, 40, 40, 10, 20, 30],
'Month': [pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00')]}
df = pd.DataFrame(data)
# pivot the dataframe before changing Month to a string
dfp = pd.pivot_table(df, values='sales', index=["Region"], columns="Month").reset_index()
# add Total column
dfp = dfp.append(pd.Series(dfp.sum(), name='System')).assign(Total=dfp.sum(1))
# extract the datetime column names
dates = [v for v in dfp.columns[1:-1]]
# reformat dates to the desired string format
dates_str = [v.strftime('%b-%Y') for v in dates]
# create a dict
updates = dict(zip(dates, dates_str))
# rename the columns, which will stay in the current, correct order
dfp.rename(columns=updates, inplace=True)
# display(dfp)
Month Region Sep-2020 Oct-2020 Dec-2020 Jan-2021 Total
0 Canada 10 20 30 40 100.0
1 Europe 30 20 10 40 100.0
2 US 40 10 20 30 100.0
System CanadaEuropeUS 80 50 60 110 NaN
# save to Excel, without the Total column; remove .iloc[:, :-1] to keep the Total column
dfp.iloc[:, :-1].to_excel('test.xlsx', sheet_name='sales', index=False, header=True)
Upvotes: 1