Kitooos
Kitooos

Reputation: 37

Outlier Handling in data mining

I have one outier in Body Mass Index column which is very far from other data. The second maximum is 38.1, whereas the outlier is 294. It is actually 29.4 and the error occurred while collecting the data. I don't want to delete the row as I have a limited number of data. Can anyone tell a best technical approach to deal with this problem? Is it a good way to treat the value as missing and apply some method like Expectation Maximization Imputation or Bayesian Multiple Imputation? Please help me to solve the issue. Thanks

Upvotes: 0

Views: 61

Answers (2)

Steffen Moritz
Steffen Moritz

Reputation: 7730

Yes, if it really is a outlier it is ok if you remove it and use imputation techniques to replace it.

Be sure, that you understand the concept of multiple imputation before using it. You also have to change your processing steps after the imputation itself if you want to use MI correctly. (if you are using are you can have a look at the mice package)

If you don't want to work with multiple imputed datasets, EM based imputation algorithms are a solid choice. (if you are using R you can look into packages VIM or imputeR)

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Detect bad data, replaced it with any data imputation technique you like, if necessary.

Of course it is better if you could just leave the bad data in, and design your overall approach robust enough to handle this.

Upvotes: 1

Related Questions