freddy888
freddy888

Reputation: 1010

pandas: substract timedelta in loop

I have daily data and a loop that defines every third Friday in a month and then changes the value of a column to 2 for the 20 days from and to that third Friday. However, the marking only works for the days AFTER the third Friday not before. I do not get it why. My dataframe "merged" is the following:

Date         ID    Window
01/01/2000   1        0
01/01/2000   1        0
02/01/2000   2        0
02/01/2000   2        0

The code so far is the following:

#Get third friday in a month Friday:

c = calendar.Calendar(firstweekday=calendar.SUNDAY)
year = 2000; month = 3
monthcal = c.monthdatescalendar(year,month)
third_friday = [day for week in monthcal for day in week if \
            day.weekday() == calendar.FRIDAY and \
            day.month == month][2]

#Loop through dates to change the window column:

for beg in pd.date_range("2000-01-01", "2017-05-01"): 
    beg= third_friday
         merged["window"].loc[beg: beg + pd.to_timedelta(20,"D")] = 2
         merged["window"].loc[beg: beg - pd.to_timedelta(20,"D")] = 2

#repeat the same for the next Fridays:
    if month==12:
       year=year+1
       month=0
    if year>=2017 and month>=3:
       break
    month = month +3
    monthcal = c.monthdatescalendar(year,month)
    third_friday = [day for week in monthcal for day in week if \
                day.weekday() == calendar.FRIDAY and \
                day.month == month][2] 

When I run this code, I do not get the window column to 2 BEFORE the third Friday. Only the days 20 days after the third Friday are changed to 2. Does anybody know what am I doing wrong?

Upvotes: 0

Views: 105

Answers (1)

Maarten Fabré
Maarten Fabré

Reputation: 7058

Third Friday of the month

easiest would be to define a method to calculate the 3rd friday of the month, given a year and month. Either use your method with calendar, or something like this might work too

def third_friday_of(year, month):
    pd.DatetimeIndex(start = '%i/%i/15' % (year, month, ), end='%i/%i/21' % (year, month, ), freq='d')
    return daterange[daterange.weekday == 4][0]

this returns a pandas.Timestamp, but that's a subclass of datetime.datetime, so should pose no further problems in your program

Actual calculations

I also defined a seperate method to do the actual changing of the DataFrame, with the interval and window as parameters

def process_dataframe(df, begin_year, begin_month, end_year, end_month, interval_months=3, window=20):
    end_month = min(end_month + 1, 12)
    dates = pd.DatetimeIndex(start = '%i/%i' % ( begin_year, begin_month,), end='%i/%i' % (end_year, end_month), freq='%im' % interval_months)
    for d in dates:
        third_friday = third_friday_of(d.year, d.month)
#         print(d, third_friday)
        df.loc[third_friday - pd.Timedelta(window, unit='d') : third_friday 2 pd.Timedelta(window, unit='d'), 'Window'] = 2

The reason it might not have worked for you was merged["window"].loc[beg: beg - pd.to_timedelta(20,"D")] = 2 should have been merged["window"].loc[beg - pd.to_timedelta(20,"D"):beg] = 2

Chained assignment

merged["window"].loc[beg: beg + pd.to_timedelta(20,"D")] = 2 in itself has a second problem. With merged["window"] you ask for a Series, but it's not 100% clear or deterministic whether you get a view or a copy. Better is to do this in 1 .loc like in my code

Upvotes: 1

Related Questions