Reputation: 4242
I have a pandas dataframe data_ask_bid
indexed by a Datetime stamp for which I want to keep only the rows in the date range: Monday @ 00:00 - Friday @ 21:59. For this, I wrote the following line:
data_ask_bid = data_ask_bid[((0 <= data_ask_bid.index.weekday <= 3) | (data_ask_bid.index.weekday == 4 & data_ask_bid.index.hour < 22))]
Although there seems to be a problem with the logical indexing, as it's throwing the error 'The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()'. Where did I go wrong in the code?
Upvotes: 1
Views: 450
Reputation: 863701
I think you can use for check values numpy.in1d
:
mask1 = np.in1d(data_ask_bid.index.weekday, [0,1,2,3])
mask2 = data_ask_bid.index.weekday == 4
mask3 = data_ask_bid.index.hour < 22
mask = mask1 | (mask2 & mask3)
data_ask_bid = data_ask_bid[mask]
Sample:
start = pd.to_datetime('2017-02-10 15:00:00')
rng = pd.date_range(start, periods=20, freq='7h')
data_ask_bid = pd.DataFrame({'a': range(20)}, index=rng)
#print (data_ask_bid)
w = data_ask_bid.index.weekday
mask1 = np.in1d(w, [0,1,2,3])
mask2 = w == 4
mask3 = data_ask_bid.index.hour < 22
mask = mask1 | (mask2 & mask3)
print (mask)
[ True False False False False False False False False True True True
True True True True True True True True]
data_ask_bid = data_ask_bid[mask]
print (data_ask_bid)
a
2017-02-10 15:00:00 0
2017-02-13 06:00:00 9
2017-02-13 13:00:00 10
2017-02-13 20:00:00 11
2017-02-14 03:00:00 12
2017-02-14 10:00:00 13
2017-02-14 17:00:00 14
2017-02-15 00:00:00 15
2017-02-15 07:00:00 16
2017-02-15 14:00:00 17
2017-02-15 21:00:00 18
2017-02-16 04:00:00 19
Timings:
start = pd.to_datetime('2017-02-10 15:00:00')
N = 1000000
rng = pd.date_range(start, periods=N, freq='H')
data_ask_bid = pd.DataFrame({'a': range(N)}, index=rng)
print (data_ask_bid)
def jez(data_ask_bid):
w = data_ask_bid.index.weekday
mask1 = np.in1d(w, [0,1,2,3])
mask2 = w == 4
mask3 = data_ask_bid.index.hour < 22
data_ask_bid = data_ask_bid[mask1 | (mask2 & mask3)]
return (data_ask_bid)
print (jez(data_ask_bid))
print (data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))])
In [273]: %timeit (jez(data_ask_bid))
10 loops, best of 3: 142 ms per loop
In [274]: %timeit (data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))])
1 loop, best of 3: 267 ms per loop
Upvotes: 1
Reputation: 4242
Just found out that Pandas doesn't work with a clause of the type 0 <= data_ask_bid.index.weekday <= 3
so I needed to divide it into 2 separate clauses for it to work:
data_ask_bid = data_ask_bid[(((data_ask_bid.index.weekday >= 0) & (data_ask_bid.index.weekday <= 3)) | ((data_ask_bid.index.weekday == 4) & (data_ask_bid.index.hour < 22)))]
Upvotes: 0