ling12
ling12

Reputation:

deleting outliers in r

I have a large set of data from an excel file (saved as csv) that has trials (X) and times (Y) within it. I know there is a code to take out single outliers within a trial by using the chi square test code. But, I want to be able to take out the entire column that has outliers within the data set, while leaving the other data in the file untouched. I am having a tough time finding/coming up with a code that will allow this. Are there any suggestions?!

Upvotes: 1

Views: 1651

Answers (1)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

Given your response to @user603, I gather you want to delete an entire X-variable from your dataset if even just one observation has an outlier on that variable. This is trivial to do in R. Use your preferred strategy to identify outliers and assign it to a variable:

outs = c(...)
data = data[,-outs]

Alternatively, you could just not include those variables in your model formula and leave the data.frame as it is.


On a different note, I think this is a very bad idea, and I suspect that there must be some confusion prompting you to believe this is something you should do. Let me lay out a few things:

  1. It usually doesn't make sense to think of covariates as having outliers. We typically think of outliers as being in the response variable. In which case, one possibility would be to delete rows (i.e., data = data[-outs,]).
  2. If you do have outliers, deleting observations is generally the worst of your possible options. It would be much better to use a robust loss function, such as Tukey's bisquare.

Upvotes: 11

Related Questions