SH_IQ
SH_IQ

Reputation: 709

How to remove the following outliers?

I have this data below and I am just practicing on how to remove the outliers from data:

My_Data

So, After inspecting my data, I found it has no missing or duplicated values, but it has a lot of outliers as shown from the figure below:

Data_visualization

So, I have drawn the boxplot for my fund_A as shown below:

fund_A

Then, I applied IQR method as shown in this piece of code below:

Q1 = bank['fund_A'].quantile(0.25)
Q3 = bank['fund_A'].quantile(0.75)
IQR = Q3 - Q1
lower_lim = Q1 - 1.5*IQR
upper_lim = Q3 + 1.5*IQR
outliers_15_low = (bank['fund_A'] < lower_lim)
outliers_15_up = (bank['fund_A'] > upper_lim)
len(bank['fund_A']) - (len(bank['fund_A'][outliers_15_low])+len(bank['fund_A'][outliers_15_up]))
bank['fund_A'][(outliers_15_low|outliers_15_up)]
bank['fund_A'][~(outliers_15_low|outliers_15_up)]

Then, when I replot my data, it still has some outliers as shown below:

fund_A_replot

May you please guide me? Am I on the right track? How to remove it completely? and Do I need to apply the same procedure for others? I am a beginner to such topic.

Upvotes: 0

Views: 206

Answers (2)

letdatado
letdatado

Reputation: 241

If you have a lot of outliers, then try not to think of those as outliers...

Well, I understand that you are doing this for practice purposes. I think you should try a few scaling techniques on this data and then see its impact

Best of Luck

Upvotes: 1

yash bhangare
yash bhangare

Reputation: 319

Let check if fund_A column may contain too small and too large values compared to others value. Try to get those values and removed them if possible or normalize them.

We can help in a better way if we get the dataset or that column itself.

Upvotes: 1

Related Questions