Biok17151
Biok17151

Reputation: 65

How to drop rows with a value of less than a percentage of the maximum per group

I have a pandas dataframe with a time series of a signal with some peaks identified:

Time (s) Intensity Peak              
1        1         a
2        10        a
3        30        a
4        100       a
5        40        a
6        20        a
7        2         a
1        20        b
2        100       b
3        300       b
4        80        b
5        20        b
6        2         b

I would like to drop the rows where the Intensity value is less than 10% of the maximum Intensity value for each peak in order to obtain:

Time (s) Intensity Peak              

3        30        a
4        200       a
5        40        a
6        25        a
2        100       b
3        300       b
4        80        b

How would I do that? I tried looking for a groupby function that would do that but I just cannot seem to find something that fits. Thank you!

Upvotes: 2

Views: 890

Answers (2)

Ch3steR
Ch3steR

Reputation: 20669

You could use GroupBy.transform with max to get max from each group and take 10% using Series.div. Now, compare that with df['Intensity'] and use it for boolean indexing.

max_vals = df.groupby('Peak')['Intensity'].transform('max').div(10)
mask     = df['Intensity'] > max_vals

df[mask]

#      Time (s)  Intensity Peak
# 2         3         30    a
# 3         4        100    a
# 4         5         40    a
# 5         6         20    a
# 8         2        100    b
# 9         3        300    b
# 10        4         80    b

Upvotes: 3

user17242583
user17242583

Reputation:

Use groupby to generate a mask:

filtered = df[df.groupby('Peak')['Intensity'].apply(lambda x: x > x.max() / 10)]

Output:

>>> filtered
    Time(s)  Intensity Peak
2         3         30    a
3         4        100    a
4         5         40    a
5         6         20    a
8         2        100    b
9         3        300    b
10        4         80    b

Upvotes: 3

Related Questions