Jay Zhang
Jay Zhang

Reputation: 629

Pandas DataFrame Difference among Rows for Same Group

It is difficult to describe the problem in Title. Let me give below example:

I have below dataframe

id  value
asd123  2
asd123  4
asd124  15
asd124  32
asd125  2
asd125  3
asd126  5
asd126  14
asd126  22
asd127  7
asd127  17
asd127  30
asd128  23
asd128  28
asd129  3
asd130  25
asd131  56
asd132  3
asd133  53

I would need to look for same id group and two consecutive rows in value column difference greater than 5.(value column always asc) If it is just one record, it doesn't need to look for the difference for the value.

I tried adding a sequence column for my DF:

id  value   sequence
asd123  2   1
asd123  4   2
asd124  15  1
asd124  32  2
asd125  2   1
asd125  3   2
asd126  5   1
asd126  14  2
asd126  22  3
asd127  7   1
asd127  17  2
asd127  30  3
asd128  23  1
asd128  28  2
asd129  3   1
asd130  25  1
asd131  56  1
asd132  3   1
asd133  53  1

Basically, i can drop if sequence = 1 records, for sequence greater than 1, I need to look for the difference of Values column among the rows, if values difference greater than 5, i need to filter out that records. For examples like 3 rows, i need to look two consecutive rows and minus seq 2 - seq 1 values and 3-2, if either results greater than 5, i will need id.

Thank you in advance.

df['sequence']=df.groupby('id').cumcount()
df_2=df[df['sequence']>=2]

Upvotes: 0

Views: 741

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

IIUC:

df.set_index('id').groupby(level=0).diff().query('value > 5').index.unique().tolist()

Output:

['asd124', 'asd126', 'asd127']

Upvotes: 1

Related Questions