user1071182
user1071182

Reputation: 1627

Create a boolean column based on a timestamp index

I have a timeseries dataframe and I need to create a boolean column if the time is between certain hours of the day. I am able to get an array with the matching index location, but how do I turn that into a boolean column? Is the indexer_between_time method the quickest way of doing this calculation?

aapl.csv

Datetime,Open,High,Low,Close,Volume,Dividends,Stock Splits
2020-10-26 04:15:00-04:00,113.7,113.78,113.5,113.5,0,0,0
2020-10-26 04:16:00-04:00,113.5,113.72,113.5,113.72,0,0,0
2020-10-26 04:17:00-04:00,113.69,113.79,113.65,113.74,0,0,0
2020-10-26 04:18:00-04:00,113.65,113.65,113.59,113.6,0,0,0
2020-10-26 04:19:00-04:00,113.55,113.59,113.4,113.54,0,0,0
2020-10-26 04:20:00-04:00,113.5,113.68,113.5,113.68,0,0,0
2020-10-26 04:21:00-04:00,113.71,113.71,113.6,113.6,0,0,0
2020-10-26 04:22:00-04:00,113.68,113.68,113.67,113.68,0,0,0


>>> df.read_csv("aapl.csv")
                             Open    High     Low   Close  Volume  Dividends  Stock Splits
Datetime                                                                                  
2020-10-26 04:15:00-04:00  113.70  113.78  113.50  113.50       0          0             0
2020-10-26 04:16:00-04:00  113.50  113.72  113.50  113.72       0          0             0
2020-10-26 04:17:00-04:00  113.69  113.79  113.65  113.74       0          0             0
2020-10-26 04:18:00-04:00  113.65  113.65  113.59  113.60       0          0             0
2020-10-26 04:19:00-04:00  113.55  113.59  113.40  113.54       0          0             0


df.index.indexer_between_time('9:30','15:59')
array([ 264,  265,  266, ..., 4166, 4167, 4168])


df['rth'] = ... 1 if in above array, else 0

Below are some alternative methods I've tried. The apply method doesn't work on an index. I have to first copy over the index to column.

df['rth'] = df['bar_start'].apply(lambda dt: '0' if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59) else '1')

Loc method is slow.

for i in range(0, len(df.values)):
    dt = df.index[i]
    if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59):
        df.loc['rth', i] = 0
    else:
        df.loc['rth', i] = 1

Upvotes: 0

Views: 334

Answers (4)

user1071182
user1071182

Reputation: 1627

df=df.reset_index()    
df['rth'] = pd.to_datetime(df['Datetime']).dt.time.between(datetime.time(9,30),datetime.time(15,59)).astype(int)

Upvotes: 0

David Erickson
David Erickson

Reputation: 16683

You can use:

df = df.reset_index()
d = pd.to_datetime(df['Datetime'].str[:-6])
m = (d.dt.hour.between(9,15)) & (d.dt.minute.between(30,59))
df['rth'] = np.where(m, 1, 0)
df

Upvotes: 0

wwnde
wwnde

Reputation: 26676

Please Try

df=df.reset_index()#Reset index

#Coerce Datetime to str H:m and compare df['status']=pd.to_datetime(df['Datetime']).dt.strftime('%H:%m').between('09:30','15:39').astype(int)

Upvotes: 1

Mehdi Golzadeh
Mehdi Golzadeh

Reputation: 2583

Use np.where its faster:

df['rth'] = np.where( (df['bar_start'] < datetime.time(9,30)) | ( df['bar_start'] > datetime.time(15,59)),False, True)

Upvotes: 0

Related Questions