Reputation: 624
With elasticsearch search Indexed the data. The input file much match query is dob and last name. It has student with same dob. So that file also coming as output. Have an idea about to remove low score rows. How can I approach?
Filename Name DOB Score PageNumber
11086 Ram 11 06 1930 6.4504585 1
11086 Ram 11 06 1930 6.4504585 2
11086 Ram 11 06 1930 6.4504585 1
81564 Kiran 11 06 1930 3.5517883 2
81564 Kiran 11 06 1930 3.5517883 33
81564 Kiran 11 06 1930 3.5517883 12
754133 peter 11 06 1930 2.5905614 1
754133 peter 11 06 1930 2.5905614 1
Desired output
Filename Name DOB Score PageNumber
11086 Ram 11 06 1930 6.4504585 1
11086 Ram 11 06 1930 6.4504585 2
11086 Ram 11 06 1930 6.4504585 1
Upvotes: 1
Views: 3054
Reputation: 403130
Let's try .std
based filtering.
df = df[~((df.Score - df.Score.max()).abs() > df.Score.std())]
df
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
Score.std
becomes the dynamic threshold for your data.
Where,
((df.Score - df.Score.max()).abs())
0 0.000000
1 0.000000
2 0.000000
3 2.898670
4 2.898670
5 2.898670
6 3.859897
7 3.859897
Name: Score, dtype: float64
df.Score.std()
1.7451830491923459
df.Score.max()
6.4504584999999999
Upvotes: 3
Reputation: 294516
Supposing you want only rows with a score greater than 3
df.query('Score > 3')
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
3 81564 Kiran 11 06 1930 3.551788 2
4 81564 Kiran 11 06 1930 3.551788 33
5 81564 Kiran 11 06 1930 3.551788 12
Supposing you want to filter by some multiple of standard deviation
df[df.Score > (df.Score.mean() - 1 * df.Score.std())]
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
3 81564 Kiran 11 06 1930 3.551788 2
4 81564 Kiran 11 06 1930 3.551788 33
5 81564 Kiran 11 06 1930 3.551788 12
Or you can grab just the rows that equal the max
df.query('Score == @df.Score.max()')
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
Or
df[df.Score == df.Score.max()]
Filename Name DOB Score PageNumber
0 11086 Ram 11 06 1930 6.450458 1
1 11086 Ram 11 06 1930 6.450458 2
2 11086 Ram 11 06 1930 6.450458 1
Upvotes: 4