Reputation: 65
I have a pandas dataframe with a time series of a signal with some peaks identified:
Time (s) Intensity Peak
1 1 a
2 10 a
3 30 a
4 100 a
5 40 a
6 20 a
7 2 a
1 20 b
2 100 b
3 300 b
4 80 b
5 20 b
6 2 b
I would like to drop the rows where the Intensity value is less than 10% of the maximum Intensity value for each peak in order to obtain:
Time (s) Intensity Peak
3 30 a
4 200 a
5 40 a
6 25 a
2 100 b
3 300 b
4 80 b
How would I do that? I tried looking for a groupby function that would do that but I just cannot seem to find something that fits. Thank you!
Upvotes: 2
Views: 890
Reputation: 20669
You could use GroupBy.transform
with max
to get max from each group and take 10% using Series.div
. Now, compare that with df['Intensity']
and use it for boolean indexing.
max_vals = df.groupby('Peak')['Intensity'].transform('max').div(10)
mask = df['Intensity'] > max_vals
df[mask]
# Time (s) Intensity Peak
# 2 3 30 a
# 3 4 100 a
# 4 5 40 a
# 5 6 20 a
# 8 2 100 b
# 9 3 300 b
# 10 4 80 b
Upvotes: 3
Reputation:
Use groupby
to generate a mask:
filtered = df[df.groupby('Peak')['Intensity'].apply(lambda x: x > x.max() / 10)]
Output:
>>> filtered
Time(s) Intensity Peak
2 3 30 a
3 4 100 a
4 5 40 a
5 6 20 a
8 2 100 b
9 3 300 b
10 4 80 b
Upvotes: 3