XCeptable
XCeptable

Reputation: 1267

Remove outliers from data frame in R?

I am trying to remove outliers from my data. The outliers in my case are the values that are away from rest of the data when plotted on a boxplot. After removing outliers, I will save data in new file and run some prediction model to see the results. How different they are from the original data.

I used one tutorial and adopted it to remove outliers from my data. The tutorial uses boxplotting to figure out the outliers.

It works fine when I run it on a column that has outliers. But it raises errors when I run it for a column that don't have outliers. How to remove this error?

Here is code:

outlier_rem <- Data_combined #data-frame with 25 var, few have outliers

#removing outliers from the column

outliers <- boxplot(outlier_rem$var1, plot=FALSE)$out
#print(outliers)
ol <- outlier_rem[-which(outlier_rem$var1 %in% outliers),]

dim(ol)
# [1]  0 25
boxplot(ol)

Produces the error:

no non-missing arguments to min; returning Infno non-missing arguments to max; 
returning -InfError in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : 
  need finite 'ylim' values

Upvotes: 1

Views: 5639

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

The following works

# Sample data based on mtcars and one additional row
df <- rbind(mtcars[, 1:3], c(100, 6, 300))

# Identify outliers        
outliers <- boxplot(df$mpg, plot = FALSE)$out
#[1]  33.9 100.0

# Remove outliers
df[!(df$mpg %in% outliers), ]

The reason why your method fails is because if there are no outliers, which(mtcars$mpg %in% numeric(0)) returns integer(0) and you end up with a zero-row data.frame, which is exactly what you see from dim.

outliers <- boxplot(mtcars$mpg, plot = FALSE)$out
outliers
#numeric(0)

Compare

which(mtcars$mpg %in% outliers)
#integer(0)

with

df$mpg %in% outliers
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

There exists a nice post here on SO that elaborates on this point.

Upvotes: 7

Related Questions