mancal
mancal

Reputation: 95

How to sort column names, that are string formatted dates, by datetime order

I have a dataframe which has a datetime column called Month and two other columns:

data = [['Canada',10, '2020-09-01'], ['Canada',20, '2020-10-01'], ['Canada',30, '2020-12-01'], ['Canada',40, '2021-01-01'],
            ['Europe',30, '2020-09-01'], ['Europe',20, '2020-10-01'], ['Europe',10, '2020-12-01'], ['Europe',40, '2021-01-01'],
            ['US',40, '2020-09-01'], ['US',10, '2020-10-01'], ['US',20, '2020-12-01'], ['US',30, '2021-01-01']]

df = pd.DataFrame(data,columns=['Region','sales', 'Month'])

Next, I convert the 'Month' column to a string with a specific format:

df['Month'] = df['Month'].dt.strftime('%b-%Y')

Now, I pivot the dataframe and export to excel:

df['Month'] = pd.pivot_table(df['Month'], values = 'sales', index=["Region"], columns = "Month").reset_index() 

df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)

Since the 'Month' column is a string, when I write the dataframe to excel, the dates are sorted alphabetically. I want the dates to be sorted by datetime values.

I tried to convert the 'Month' column to datetime before pivoting, but in that case I don't get the dates in the right format after exporting to excel:

df['Month'] = pd.to_datetime(df['Month'], format='%b-%Y')

I even tried to use ExcelWriter format but that doesn't seem to work as well.

Output which I am getting: Output which I am getting:

Desired Output: Desired Output

df['Month'] = pd.to_datetime(df['Month'])
df = pd.pivot_table(df, values = 'sales', index=["Region"], columns = "Month").reset_index()
df = df.append(pd.Series(df.sum(),name='System'))\
    .assign(Total=df.sum(1))

# extract the datetime component of the multilevel column names
dates = [v for v in df.columns[1:]]

# reformat dates to the desired string format
dates_str = [v.strftime('%b-%Y') for v in dates]

# create a dict
updates = dict(zip(dates, dates_str))

# rename the columns, which will stay in the current, correct order
df = df.rename(columns=updates, inplace=True)

df.to_excel(writer, sheet_name='sales', index=False, startrow=4, header=False)

Upvotes: 1

Views: 441

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62383

  • This assumes the 'Month' column dates are a datetime dtype from pd.to_datetime.
    • df.Month = pd.to_datetime(df.Month), before pivoting.
  • Pivot the dataframe before converting 'Months' to a string, and then rename the columns.
    • .rename is used, because assigning column names by list slicing and assignment results in a TypeError (e.g. df.columns[:1] = 3)
import pandas as pd

# sample dataframe
data = {'Region': ['Canada', 'Canada', 'Canada', 'Canada', 'Europe', 'Europe', 'Europe', 'Europe', 'US', 'US', 'US', 'US'],
        'sales': [10, 20, 30, 40, 30, 20, 10, 40, 40, 10, 20, 30],
        'Month': [pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00'), pd.Timestamp('2020-09-01 00:00:00'), pd.Timestamp('2020-10-01 00:00:00'), pd.Timestamp('2020-12-01 00:00:00'), pd.Timestamp('2021-01-01 00:00:00')]}
df = pd.DataFrame(data)

# pivot the dataframe before changing Month to a string
dfp = pd.pivot_table(df, values='sales', index=["Region"], columns="Month").reset_index() 

# add Total column
dfp = dfp.append(pd.Series(dfp.sum(), name='System')).assign(Total=dfp.sum(1))

# extract the datetime column names
dates = [v for v in dfp.columns[1:-1]]

# reformat dates to the desired string format
dates_str = [v.strftime('%b-%Y') for v in dates]

# create a dict
updates = dict(zip(dates, dates_str))

# rename the columns, which will stay in the current, correct order
dfp.rename(columns=updates, inplace=True)

# display(dfp)
Month           Region  Sep-2020  Oct-2020  Dec-2020  Jan-2021  Total
0               Canada        10        20        30        40  100.0
1               Europe        30        20        10        40  100.0
2                   US        40        10        20        30  100.0
System  CanadaEuropeUS        80        50        60       110    NaN

# save to Excel, without the Total column; remove .iloc[:, :-1] to keep the Total column
dfp.iloc[:, :-1].to_excel('test.xlsx', sheet_name='sales', index=False, header=True)

Excel view

enter image description here

Upvotes: 1

Related Questions