Reputation: 554
i would like to extend a question i asked on link to question
the scenario is more complex, so i think the solutions there will not fit
I'm trying to create subset from dataframe(100k-500k rows) with the following format
d = {'time':[1,2,3,5,7,9,9.5,10], 'val':['not','match','match','not','not','match','match','match'],
'group':['a','a','b','b','b','a','a','c']}
df = pd.DataFrame(d)
print(df)
group time val
0 a 1.0 not
1 a 2.0 match
2 b 3.0 match
3 b 5.0 not
4 b 7.0 not
5 a 9.0 match
6 a 9.5 match
7 c 10.0 match
I want to select a subset that include all rows when time are within limited range. For example if range is <=1 the first and last three rows are selected, and are from different groups
And my desired output
group time val
1 a 2.0 match
2 b 3.0 match
5 a 9.0 match
6 a 9.5 match
7 c 10.0 match
Upvotes: 0
Views: 2319
Reputation: 27889
This works on your example, hopefully will on your data:
df.loc[((df['time'].diff() <= 1)|(df['time'].diff(-1) >= -1))&((df['group']!=df['group'].shift(-1).fillna(df['group']))|(df['group']!=df['group'].shift(1).fillna(df['group'])))]
Upvotes: 1