Huragok
Huragok

Reputation: 1

How can I filter outliers in data that is manually recorded?

Different people have to write down values on a certain type of parameter in order to fill out a table, and people obviously tend to write wrong. Sometimes, by a factor of 1000. This creates a lot of outliers that is not related to the data itself but instead human caused error.

Is there any appropriate method for dealing with such outliers? It's difficult to distinguish "real" outliers from ones that were caused by human error.

The data in question:

What is an ideal way for filtering outliers and excluding them from the data?

I have tried to:

  1. Manually remove outliers: I stopped doing this because this was a subjective form of filtering, essentially "illegal" I believe. It's also difficult to distinguish "real" outliers from human caused ones.
  2. Box plot: I have tried to calculate Q1, Q3, IQR, lower limit (LL = Q1 - 1,5*IQR) and upper limit (UL = Q3 + 1,5*IQR). I then filter out data that happens to end up outside of the whiskers and boxes. The "problem" I suppose is that the data is non-negative. A disproportionate amount of outliers will be removed near the upper limit, but none are removed from the lower limit because LL always becomes negative and I have no negative data.

Is it "good enough" to just stick with box plots? Or are there actual statistical methods that can be used to objectively "remove" outliers?

Upvotes: 0

Views: 37

Answers (0)

Related Questions