Reputation: 629
It is difficult to describe the problem in Title. Let me give below example:
I have below dataframe
id value
asd123 2
asd123 4
asd124 15
asd124 32
asd125 2
asd125 3
asd126 5
asd126 14
asd126 22
asd127 7
asd127 17
asd127 30
asd128 23
asd128 28
asd129 3
asd130 25
asd131 56
asd132 3
asd133 53
I would need to look for same id group and two consecutive rows in value column difference greater than 5.(value column always asc) If it is just one record, it doesn't need to look for the difference for the value.
I tried adding a sequence column for my DF:
id value sequence
asd123 2 1
asd123 4 2
asd124 15 1
asd124 32 2
asd125 2 1
asd125 3 2
asd126 5 1
asd126 14 2
asd126 22 3
asd127 7 1
asd127 17 2
asd127 30 3
asd128 23 1
asd128 28 2
asd129 3 1
asd130 25 1
asd131 56 1
asd132 3 1
asd133 53 1
Basically, i can drop if sequence = 1 records, for sequence greater than 1, I need to look for the difference of Values column among the rows, if values difference greater than 5, i need to filter out that records. For examples like 3 rows, i need to look two consecutive rows and minus seq 2 - seq 1 values and 3-2, if either results greater than 5, i will need id.
Thank you in advance.
df['sequence']=df.groupby('id').cumcount()
df_2=df[df['sequence']>=2]
Upvotes: 0
Views: 741
Reputation: 153460
IIUC:
df.set_index('id').groupby(level=0).diff().query('value > 5').index.unique().tolist()
Output:
['asd124', 'asd126', 'asd127']
Upvotes: 1