Pandas select rows when column value within range from another row column value with group filter

Question

i would like to extend a question i asked on link to question

the scenario is more complex, so i think the solutions there will not fit

I'm trying to create subset from dataframe(100k-500k rows) with the following format

d = {'time':[1,2,3,5,7,9,9.5,10], 'val':['not','match','match','not','not','match','match','match'],
    'group':['a','a','b','b','b','a','a','c']}
df = pd.DataFrame(d)
print(df)
  group  time    val
0     a   1.0    not
1     a   2.0  match
2     b   3.0  match
3     b   5.0    not
4     b   7.0    not
5     a   9.0  match
6     a   9.5  match
7     c  10.0  match

I want to select a subset that include all rows when time are within limited range. For example if range is <=1 the first and last three rows are selected, and are from different groups

row0 has valid time diff (row1-row0) but they are in the same group.
row1 has valid time diff (row2-row1) and each have a different group.
row5 has valid time diff (row7-row5) and each have a different group.
row6 has valid time diff (row7-row6) and each have a different group.

And my desired output

  group  time    val
1     a   2.0  match
2     b   3.0  match
5     a   9.0  match
6     a   9.5  match
7     c  10.0  match

zipa · Accepted Answer

This works on your example, hopefully will on your data:

df.loc[((df['time'].diff() <= 1)|(df['time'].diff(-1) >= -1))&((df['group']!=df['group'].shift(-1).fillna(df['group']))|(df['group']!=df['group'].shift(1).fillna(df['group'])))]

Pandas select rows when column value within range from another row column value with group filter

Answers (1)

Related Questions