Reputation:
I have a large set of data from an excel file (saved as csv) that has trials (X) and times (Y) within it. I know there is a code to take out single outliers within a trial by using the chi square test code. But, I want to be able to take out the entire column that has outliers within the data set, while leaving the other data in the file untouched. I am having a tough time finding/coming up with a code that will allow this. Are there any suggestions?!
Upvotes: 1
Views: 1651
Reputation: 11893
Given your response to @user603, I gather you want to delete an entire X-variable from your dataset if even just one observation has an outlier on that variable. This is trivial to do in R. Use your preferred strategy to identify outliers and assign it to a variable:
outs = c(...)
data = data[,-outs]
Alternatively, you could just not include those variables in your model formula and leave the data.frame as it is.
On a different note, I think this is a very bad idea, and I suspect that there must be some confusion prompting you to believe this is something you should do. Let me lay out a few things:
data = data[-outs,]
). Upvotes: 11