Christian
Christian

Reputation: 509

Create a custom month that starts at day x

From my date column I want to create another one with the months. But I would like create a specific range. For exammple Jan will be from 2020-01-03 to 2020-02-03. Feb will be from 20-02-03 to 20-03-03 and so on.

This is my df:

import pandas as pd
import datetime as dt
#Start of the calendar
start='2020-01-01'
#End of the calendar
end='2020-12-31'
#Create the calendar df
cal_df = pd.DataFrame({"Date": pd.date_range(start, end)})
#Extract the day
cal_df['Day'] = cal_df['Date'].dt.day
#Extract the day name
cal_df['Day_name'] =cal_df[['Date']].apply(lambda x: dt.datetime.strftime(x['Date'], '%A'), axis=1)

I tried to use the if statement but does not behave like I want.

start = '2020-01-03'
end_date = '2020-02-03'
if  [(cal_df['Date'] > start) & (cal_df['Date'] <= end_date)]:
    cal_df['Month'] = 'Jan'
else:
    cal_df['Month'] = ' others'

So the above code puts Jan everywhere even if Date is les than the specified end_date.

Can someone tell what I am doing wrong?

Upvotes: 2

Views: 105

Answers (1)

jezrael
jezrael

Reputation: 862731

You can use GroupBy.transform with helper groups compared next day, 4 with Series.dt.strftime for months:

m = cal_df['Date'].dt.strftime('%b')
g = cal_df['Day'].eq(4).cumsum()

cal_df['Month'] = m.groupby(g).transform('first')

Or idea from comments - subtract 3 days, but first values are Dec:

cal_df['Month1'] = cal_df['Date'].sub(pd.Timedelta(3, 'd')).dt.strftime('%b')

So you can change solution by Series.where and back filling missing values:

s = cal_df['Date'].sub(pd.Timedelta(3, 'd'))
cal_df['Month2'] =  s.where(s > cal_df['Date'].min()).bfill().dt.strftime('%b')

m = cal_df['Date'].dt.strftime('%b')
g = cal_df['Day'].eq(4).cumsum()

cal_df['Month'] = m.groupby(g).transform('first')

cal_df['Month1'] = cal_df['Date'].sub(pd.Timedelta(3, 'd')).dt.strftime('%b')

s = cal_df['Date'].sub(pd.Timedelta(3, 'd'))
cal_df['Month2'] =  s.where(s > cal_df['Date'].min()).bfill().dt.strftime('%b')

print (cal_df.head(10))
        Date  Day   Day_name Month Month1 Month2
0 2020-01-01    1  Wednesday   Jan    Dec    Jan
1 2020-01-02    2   Thursday   Jan    Dec    Jan
2 2020-01-03    3     Friday   Jan    Dec    Jan
3 2020-01-04    4   Saturday   Jan    Jan    Jan
4 2020-01-05    5     Sunday   Jan    Jan    Jan
5 2020-01-06    6     Monday   Jan    Jan    Jan
6 2020-01-07    7    Tuesday   Jan    Jan    Jan
7 2020-01-08    8  Wednesday   Jan    Jan    Jan
8 2020-01-09    9   Thursday   Jan    Jan    Jan
9 2020-01-10   10     Friday   Jan    Jan    Jan

Upvotes: 2

Related Questions