Reputation: 1
I have a datetime indexed dataframe with several years of intraday data, in 2 minute increments. I want to group by day and include the first row that exceeds the price at 06:30:00 in each day.
df:
Price
2009-10-12 06:30:00 904
2009-10-12 06:32:00 904
2009-10-12 06:34:00 904.5
2009-10-12 06:36:00 905
2009-10-12 06:38:00 905.5
2009-10-13 06:30:00 901
2009-10-13 06:32:00 901
2009-10-13 06:34:00 901
2009-10-13 06:36:00 902
2009-10-13 06:38:00 903
I've tried using .groupby and .apply with a lambda function to group by day and include all rows that exceed the value at 06:30:00, but get an error.
onh = pd.to_datetime('6:30:00').time()
onhBreak = df.groupby(df.index.date).apply(lambda x: x[x > x.loc[onh]])
ValueError: Can only compare identically-labeled Series objects
Desired output:
Price
2009-10-12 06:34:00 904.5
2009-10-13 06:36:00 902
*If these rows are values in a groupby, that would be good also
Any help is appreciated.
Upvotes: 0
Views: 391
Reputation: 30930
We can do:
mask_date = df['Date'].dt.time.gt(pd.to_datetime('06:30:00').time())
df_filtered = df.loc[mask_date.groupby(df['Date'].dt.date).idxmax()]
print(df_filtered)
Output
Date Value
1 2009-10-12 06:32:00 904.0
6 2009-10-13 06:32:00 901.0
Upvotes: 0
Reputation: 323366
Here we need groupby
with idxmax
df = df.to_frame('value')
df['check'] = df.index.time>onh
subdf = df.loc[df.groupby(df.index.date)['check'].idxmax()]
Out[237]:
value check
2009-10-12 00:00:00 900.0 False
2020-05-29 13:08:00 3052.0 True
subdf = subdf[subdf['check']]
Upvotes: 0