Reputation: 2065
I have a dataframe with a timeseries data of wheat in df
.
df = wt["WHEAT_USD"]
2016-05-02 02:00:00+02:00 4.780
2016-05-02 02:01:00+02:00 4.777
2016-05-02 02:02:00+02:00 4.780
2016-05-02 02:03:00+02:00 4.780
2016-05-02 02:04:00+02:00 4.780
Name: closeAsk, dtype: float64
When I plot the data it has these annoying horizontal lines because of weekends. Is there a simple way of removing the non-business days from the dataframe itself?
Something like
df = df.BDays()
Upvotes: 39
Views: 39048
Reputation: 4608
simply, filtering can be done by day names. For example if you don't want saturdays and sundays you can use this:
df=df[(df['date'].dt.day_name()!='Saturday') & (df['date'].dt.day_name()!='Sunday')]
not for special holidays etc
Upvotes: 0
Reputation: 1979
Building on @Andy Hayden solution, you can also use query
with a dataframe
for better method chaining in a "modern pandas" fashion.
my_date
)df.query("my_date.dt.dayofweek < 5")
my_index_name
or date
)df.query("my_index_name.dt.dayofweek < 5")
df.rename_axis("date").query("date.dt.dayofweek < 5")
( index.dt.dayofweek
or index.dayofweek
does not works for me ) .
Upvotes: 0
Reputation: 1
using workdays, you can count for holidays pretty easily
import workdays as wd
def drop_non_busdays(df, holidays=None):
if holidays is None:
holidays = []
start_date = df.index.to_list()[0].date()
end_date = df.index.to_list()[-1].date()
start_wd = wd.workday(wd.workday(start_date, -1, holidays), 1, holidays)
end_wd = wd.workday(wd.workday(end_date, 1, holidays), -1, holidays)
b_days = [start_wd]
while b_days[-1] < end_wd:
b_days.append(wd.workday(b_days[-1], 1, holidays))
valid = [i in b_days for i in df.index]
return df[valid]
Upvotes: 0
Reputation: 56
I am building a backtester for stock/FX trading and I also have these issue with days that are nan because that they are holidays or other non trading days.. you can download a financial calendar for the days that there is no trading and then you need to think about timezone and weekends.. etc..
But the best solution is not to use date/time as the index for the candles or price. So do not connect your price data to a date/time but just to a counter of candles or prices .. you can use a second index for this.. so for calculations of MA or other technical lines dont use date/time .. if you look at Metatrader 4/5 it also doesnt use date/time but the index of the data is the candle number !!
I think that you need to let go of the date-time for the price if you work with stock or FX data , of cause you can put them in a column of the data-frame but dont use it as the index This way you can avoid many problems
Upvotes: 3
Reputation: 1080
Pandas BDay
just ends up using .dayofweek<5
like the chosen answer, but can be extended to account for bank holidays, etc.
import pandas as pd
from pandas.tseries.offsets import BDay
isBusinessDay = BDay().onOffset
csv_path = 'C:\\Python27\\Lib\\site-packages\\bokeh\\sampledata\\daylight_warsaw_2013.csv'
dates_df = pd.read_csv(csv_path)
match_series = pd.to_datetime(dates_df['Date']).map(isBusinessDay)
dates_df[match_series]
Upvotes: 15
Reputation: 375485
One simple solution is to slice out the days not in Monday to Friday:
In [11]: s[s.index.dayofweek < 5]
Out[11]:
2016-05-02 00:00:00 4.780
2016-05-02 00:01:00 4.777
2016-05-02 00:02:00 4.780
2016-05-02 00:03:00 4.780
2016-05-02 00:04:00 4.780
Name: closeAsk, dtype: float64
Note: this doesn't take into account bank holidays etc.
Upvotes: 59