Drop rows based on a threshold value of a column

Question

With elasticsearch search Indexed the data. The input file much match query is dob and last name. It has student with same dob. So that file also coming as output. Have an idea about to remove low score rows. How can I approach?

Filename Name      DOB         Score PageNumber
11086   Ram     11 06 1930  6.4504585   1
11086   Ram     11 06 1930  6.4504585   2
11086   Ram     11 06 1930  6.4504585   1
81564   Kiran   11 06 1930  3.5517883   2
81564   Kiran   11 06 1930  3.5517883   33
81564   Kiran   11 06 1930  3.5517883   12
754133  peter   11 06 1930  2.5905614   1
754133  peter   11 06 1930  2.5905614   1

Desired output

Filename Name      DOB         Score PageNumber
11086   Ram     11 06 1930  6.4504585   1
11086   Ram     11 06 1930  6.4504585   2
11086   Ram     11 06 1930  6.4504585   1

cs95 · Accepted Answer

Let's try .std based filtering.

df = df[~((df.Score - df.Score.max()).abs() > df.Score.std())]
df

   Filename Name         DOB     Score  PageNumber
0     11086  Ram  11 06 1930  6.450458           1
1     11086  Ram  11 06 1930  6.450458           2
2     11086  Ram  11 06 1930  6.450458           1

Score.std becomes the dynamic threshold for your data.

Where,

((df.Score - df.Score.max()).abs())

0    0.000000
1    0.000000
2    0.000000
3    2.898670
4    2.898670
5    2.898670
6    3.859897
7    3.859897
Name: Score, dtype: float64

df.Score.std()
1.7451830491923459

df.Score.max()
6.4504584999999999

Drop rows based on a threshold value of a column

Answers (2)

Related Questions