amarykya_ishtmella
amarykya_ishtmella

Reputation: 183

How to filter for rows that have greater than a specified count of consecutive integers

I have a dataframe that is like this:

Frame_Number Parts X Y

7      Y.     7. 9
7      G      :  :
8      Y      :  :
8      Y      :  :
8      Y      :  :
9      :      :  :
10
18
18
18
18
19
20
20
21
21
22
23
24
24
25
25
25
26
27
28
29
29
29
29
30
42
45
80
80
80
81
81
81
82
82
83
109
109
120
121
122
123
124
125
126
127
128
129
130
131
132
132
132
133
190
200
202
204
205
206
:
1000

I want to select the subset of this dataframe that has at least 25 consecutive number.

For example, in my dataset there is [1,1,2,3,5,6,8,9,10..33,34,35,36,37,40,.65.1000]

So, here from 8 to 37, we have consecutive values and more than 25 (different)numbers. Also, from 40 to 65 we say that there are 25 consecutive numbers present. So, I want to select these sets of rows where there are at least 25 or more consecutive data frames present as a subset.

Upvotes: 0

Views: 613

Answers (1)

David Erickson
David Erickson

Reputation: 16683

  1. Create groups of consecutive integers:

You can create a series s that takes the cumulative count of non-consecutive integers. That means consecutive rows will have the same values in the series s as they are in the same "group" of consecutive integers

  1. Count consecutive integers within groups:

Then, you can create m by calculating the count of consecutive integers in each group using .groupby(s)

  1. Filter for groups that have consecutive values > n:

Filter the df by m where greater than any value you specify (25 in this case)


df = df.sort_values('A').drop_duplicates(subset='A')
s = (df['A'] != df['A'].shift(1) + 1).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]: 
      A
7    18
11   19
12   20
15   21
16   22
17   23
18   24
20   25
23   26
24   27
25   28
26   29
30   30
44  120
45  121
46  122
47  123
48  124
49  125
50  126
51  127
52  128
53  129
54  130
55  131
56  132
59  133

Another option to keep dupliccates:

df = d2.copy()
s = ((df['A'] != df['A'].shift(1)) & (df['A'] != df['A'].shift(1) + 1)).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]: 
      A
7    18
8    18
9    18
10   18
11   19
12   20
13   20
14   21
15   21
16   22
17   23
18   24
19   24
20   25
21   25
22   25
23   26
24   27
25   28
26   29
27   29
28   29
29   29
30   30
44  120
45  121
46  122
47  123
48  124
49  125
50  126
51  127
52  128
53  129
54  130
55  131
56  132
57  132
58  132
59  133

Upvotes: 2

Related Questions