Filtering out consecutive rows from dataset

Question

I have a dataset with the index values and one datetime variable as follows:

1      2017-01-03 09:30:01.958
46     2017-01-03 09:30:47.879
99     2017-01-03 09:33:48.121
117    2017-01-03 09:47:06.215
139    2017-01-03 09:51:06.054
1567   2017-01-03 14:17:18.949
2480   2017-01-03 15:57:13.442
2481   2017-01-03 15:57:14.333
2486   2017-01-03 15:57:37.500
2487   2017-01-03 15:57:38.677
2489   2017-01-03 15:57:41.053
2491   2017-01-03 15:57:54.870
2498   2017-01-03 15:59:24.210

What I am trying to do is to remove consecutive rows from the data (only retaining the first observation from the segment), in this case the code should drop the rows with index 2481 and 2487. I tried using

df[df.index.diff() == 0].drop()

but it only returned

AttributeError: 'Int64Index' object has no attribute 'diff'

jezrael · Accepted Answer

You can use boolean indexing, for working with index with not implemented method use to_series:

df = df[df.index.to_series().diff() != 1]
print (df)
                         date
1     2017-01-03 09:30:01.958
46    2017-01-03 09:30:47.879
99    2017-01-03 09:33:48.121
117   2017-01-03 09:47:06.215
139   2017-01-03 09:51:06.054
1567  2017-01-03 14:17:18.949
2480  2017-01-03 15:57:13.442
2486  2017-01-03 15:57:37.500
2489  2017-01-03 15:57:41.053
2491  2017-01-03 15:57:54.870
2498  2017-01-03 15:59:24.210

Thank you piRSquared for numpy alternative:

df[np.append(0, np.diff(df.index.values)) != 1]

Timings:

#[11000 rows x 1 columns]
df = pd.concat([df]*1000)

In [60]: %timeit [True] + [(i[0]+1) != i[1] for i in zip(df.index.tolist(), df.index.tolist()[1:])]
100 loops, best of 3: 4.19 ms per loop

In [61]: %timeit np.append(0, np.diff(df.index.values)) != 1
The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 33.1 µs per loop

In [62]: %timeit df.index.to_series().diff() != 1
1000 loops, best of 3: 260 µs per loop

Filtering out consecutive rows from dataset

Answers (2)

Related Questions