Aparna S
Aparna S

Reputation: 23

Removing outliers from the dataset using which function in R

I have a dataset with multiple species and each species having multiple individuals. The column format is as follows:

    S Ind    X  Y
    A 1     ax1 ay1
    A 2     ax2 ay2
    B 1     bx1 by1 

I want to plot an x-y plot for each species removing any outliers from the two columns X and Y. I have used

`outliers<- boxplot(df$X plot=F)$out` 

to identify my outliers from the 2 columns. I have also applied a for loop which calculates this for each species. To remove them from the dataset I am using

df2<- df[-which(df$X %in% outliers),]

The problem arises when there are no outliers identified or cannot be calculated due to low sample size. in such case the ouliers is empty and so the df2 is returned as an empty dataframe. Could someone please help me understand how else can I achieve this?

Upvotes: 0

Views: 205

Answers (2)

akrun
akrun

Reputation: 887741

We can use subset in base R

subset(mtcars, !cyl %in% c(4, 6))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389225

Negate the %in% value to remove the outliers.

df2<- df[!df$X %in% outliers,]

For example, with mtcars dataset -

mtcars[!mtcars$cyl %in% c(4, 6), ]

#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

With this approach when the value is absent it returns all the rows.

mtcars[!mtcars$cyl %in% 5, ]

Upvotes: 1

Related Questions