Remove outliers from a certain column

Question

I have a Dataframe by the name bids_data

bids_data:

  Supplier_ID  shiper_RFQ
----------
0    2305      5000
1    2309      5200
2    2305      6500 
3    2307      4500
4    2301      900
5    2302      10000
6    2306      4500

and I want to remove the outliers rows from shiper_RFQ and store them in another dataframe. I tried converting the shiper_RFQ in a list and then finding the outliers but it doesn't work well.

Nihal · Accepted Answer

if you have good data then use threshold = 0.5

threshold = 1
print(df[df['shiper_RFQ'].apply(lambda x: np.abs(x - df['shiper_RFQ'].mean()) / df['shiper_RFQ'].std() < threshold)])

also this

 df = df[ np.abs(df['shiper_RFQ'] - df['shiper_RFQ'].mean()) / df['shiper_RFQ'].std() < threshold]

both will have same result

output

   Supplier_ID  shiper_RFQ
0         2305        5000
1         2309        5200
2         2305        6500
3         2307        4500
6         2306        4500

if you print you can see the anomaly

print(df['shiper_RFQ'].apply(lambda x: np.abs(x - df['shiper_RFQ'].mean()) / df['shiper_RFQ'].std()))

0    0.084182
1    0.010523
2    0.468261
3    0.268329
4    1.594192
5    1.757294
6    0.268329

Remove outliers from a certain column

Answers (2)

Related Questions