Reputation: 183
I have a dataframe that is like this:
Frame_Number Parts X Y
7 Y. 7. 9
7 G : :
8 Y : :
8 Y : :
8 Y : :
9 : : :
10
18
18
18
18
19
20
20
21
21
22
23
24
24
25
25
25
26
27
28
29
29
29
29
30
42
45
80
80
80
81
81
81
82
82
83
109
109
120
121
122
123
124
125
126
127
128
129
130
131
132
132
132
133
190
200
202
204
205
206
:
1000
I want to select the subset of this dataframe that has at least 25 consecutive number.
For example, in my dataset there is [1,1,2,3,5,6,8,9,10..33,34,35,36,37,40,.65.1000]
So, here from 8 to 37, we have consecutive values and more than 25 (different)numbers. Also, from 40 to 65 we say that there are 25 consecutive numbers present. So, I want to select these sets of rows where there are at least 25 or more consecutive data frames present as a subset.
Upvotes: 0
Views: 613
Reputation: 16683
You can create a series s
that takes the cumulative count of non-consecutive integers. That means consecutive rows will have the same values in the series s
as they are in the same "group" of consecutive integers
Then, you can create m
by calculating the count
of consecutive integers in each group using .groupby(s)
Filter the df
by m
where greater than any value you specify (25 in this case)
df = df.sort_values('A').drop_duplicates(subset='A')
s = (df['A'] != df['A'].shift(1) + 1).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]:
A
7 18
11 19
12 20
15 21
16 22
17 23
18 24
20 25
23 26
24 27
25 28
26 29
30 30
44 120
45 121
46 122
47 123
48 124
49 125
50 126
51 127
52 128
53 129
54 130
55 131
56 132
59 133
Another option to keep dupliccates:
df = d2.copy()
s = ((df['A'] != df['A'].shift(1)) & (df['A'] != df['A'].shift(1) + 1)).cumsum()
m = df.groupby(s).transform('count')['A']
df = df[m>10]
df
Out[1]:
A
7 18
8 18
9 18
10 18
11 19
12 20
13 20
14 21
15 21
16 22
17 23
18 24
19 24
20 25
21 25
22 25
23 26
24 27
25 28
26 29
27 29
28 29
29 29
30 30
44 120
45 121
46 122
47 123
48 124
49 125
50 126
51 127
52 128
53 129
54 130
55 131
56 132
57 132
58 132
59 133
Upvotes: 2