Reputation: 7072
I found this example . I need to get all rows where within one second the next row has the same price. So it's same problem as the other question but it isn't working for me.
I'm using python 3.6.3 and pandas version 0.22.0
my df:
timestamp price
0 2018-04-29 13:14:43.026 6394.0
1 2018-04-29 13:16:53.714 6378.0
2 2018-04-29 13:17:01.324 6378.0
3 2018-04-29 13:17:02.246 6384.0
4 2018-04-29 13:17:07.413 6384.0
5 2018-04-29 13:17:08.331 6380.0
6 2018-04-29 13:17:11.437 6380.0
7 2018-04-29 13:17:11.895 6377.0
8 2018-04-29 13:17:13.449 6377.0
9 2018-04-29 13:17:13.452 6379.0
timestamp datetime64[ns]
price float64
dtype: object
I try the following code:
data = df[(df.groupby(["price"], as_index=False)["timestamp"].diff().fillna(0).dt.seconds <= 1).reset_index(drop=True)]
and this is the result:
timestamp price
0 2018-04-29 13:14:43.026 6394.0
1 2018-04-29 13:16:53.714 6378.0
2 2018-04-29 13:17:01.324 6378.0
4 2018-04-29 13:17:07.413 6384.0
5 2018-04-29 13:17:08.331 6380.0
7 2018-04-29 13:17:11.895 6377.0
9 2018-04-29 13:17:13.452 6379.0
EDIT:
The output should be empty but if we make this dataframe i.e.
timestamp price
0 2018-04-29 13:14:43.026 6394.0
1 2018-04-29 13:16:53.714 6378.0
2 2018-04-29 13:17:01.324 6378.0
3 2018-04-29 13:17:02.246 6378.0
4 2018-04-29 13:17:07.413 6384.0
5 2018-04-29 13:17:08.331 6380.0
6 2018-04-29 13:17:11.437 6380.0
7 2018-04-29 13:17:11.895 6377.0
8 2018-04-29 13:17:13.449 6377.0
9 2018-04-29 13:17:13.452 6377.0
it should output:
timestamp price
2 2018-04-29 13:17:01.324 6378.0
3 2018-04-29 13:17:02.246 6378.0
8 2018-04-29 13:17:13.449 6377.0
9 2018-04-29 13:17:13.452 6377.0
Upvotes: 0
Views: 132
Reputation: 4536
Unless I'm mistaken, I think you're over complicating this a bit, you should just need to take the diff:
df = pd.read_fwf(StringIO(
'''timestamp price
2018-04-29 13:14:43.026 6394.0
2018-04-29 13:16:53.714 6378.0
2018-04-29 13:17:01.324 6378.0
2018-04-29 13:17:02.246 6378.0
2018-04-29 13:17:07.413 6384.0
2018-04-29 13:17:08.331 6380.0
2018-04-29 13:17:11.437 6380.0
2018-04-29 13:17:11.895 6377.0
2018-04-29 13:17:13.449 6377.0
2018-04-29 13:17:13.452 6379.0'''
), colspecs=[(0,23), (25, 31)], dtype={
'timestamp': 'datetime64[ns]',
'price': 'float'
})
diff = df.diff()
selection = (diff['timestamp'].dt.seconds <=1) & (diff['price'] == 0)
selection = selection | selection.shift(periods=-1)
print(df[selection])
Output
timestamp price
2 2018-04-29 13:17:01.324 6378.0
3 2018-04-29 13:17:02.246 6378.0
7 2018-04-29 13:17:11.895 6377.0
8 2018-04-29 13:17:13.449 6377.0
Upvotes: 1