Reputation: 1627
I have a timeseries dataframe and I need to create a boolean column if the time is between certain hours of the day. I am able to get an array with the matching index location, but how do I turn that into a boolean column? Is the indexer_between_time method the quickest way of doing this calculation?
aapl.csv
Datetime,Open,High,Low,Close,Volume,Dividends,Stock Splits
2020-10-26 04:15:00-04:00,113.7,113.78,113.5,113.5,0,0,0
2020-10-26 04:16:00-04:00,113.5,113.72,113.5,113.72,0,0,0
2020-10-26 04:17:00-04:00,113.69,113.79,113.65,113.74,0,0,0
2020-10-26 04:18:00-04:00,113.65,113.65,113.59,113.6,0,0,0
2020-10-26 04:19:00-04:00,113.55,113.59,113.4,113.54,0,0,0
2020-10-26 04:20:00-04:00,113.5,113.68,113.5,113.68,0,0,0
2020-10-26 04:21:00-04:00,113.71,113.71,113.6,113.6,0,0,0
2020-10-26 04:22:00-04:00,113.68,113.68,113.67,113.68,0,0,0
>>> df.read_csv("aapl.csv")
Open High Low Close Volume Dividends Stock Splits
Datetime
2020-10-26 04:15:00-04:00 113.70 113.78 113.50 113.50 0 0 0
2020-10-26 04:16:00-04:00 113.50 113.72 113.50 113.72 0 0 0
2020-10-26 04:17:00-04:00 113.69 113.79 113.65 113.74 0 0 0
2020-10-26 04:18:00-04:00 113.65 113.65 113.59 113.60 0 0 0
2020-10-26 04:19:00-04:00 113.55 113.59 113.40 113.54 0 0 0
df.index.indexer_between_time('9:30','15:59')
array([ 264, 265, 266, ..., 4166, 4167, 4168])
df['rth'] = ... 1 if in above array, else 0
Below are some alternative methods I've tried. The apply method doesn't work on an index. I have to first copy over the index to column.
df['rth'] = df['bar_start'].apply(lambda dt: '0' if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59) else '1')
Loc method is slow.
for i in range(0, len(df.values)):
dt = df.index[i]
if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59):
df.loc['rth', i] = 0
else:
df.loc['rth', i] = 1
Upvotes: 0
Views: 334
Reputation: 1627
df=df.reset_index()
df['rth'] = pd.to_datetime(df['Datetime']).dt.time.between(datetime.time(9,30),datetime.time(15,59)).astype(int)
Upvotes: 0
Reputation: 16683
You can use:
df = df.reset_index()
d = pd.to_datetime(df['Datetime'].str[:-6])
m = (d.dt.hour.between(9,15)) & (d.dt.minute.between(30,59))
df['rth'] = np.where(m, 1, 0)
df
Upvotes: 0
Reputation: 26676
Please Try
df=df.reset_index()#Reset index
#Coerce Datetime to str H:m and compare
df['status']=pd.to_datetime(df['Datetime']).dt.strftime('%H:%m').between('09:30','15:39').astype(int)
Upvotes: 1
Reputation: 2583
Use np.where its faster:
df['rth'] = np.where( (df['bar_start'] < datetime.time(9,30)) | ( df['bar_start'] > datetime.time(15,59)),False, True)
Upvotes: 0