Reputation: 23
I have a dataset with multiple species and each species having multiple individuals. The column format is as follows:
S Ind X Y
A 1 ax1 ay1
A 2 ax2 ay2
B 1 bx1 by1
I want to plot an x-y plot for each species removing any outliers from the two columns X and Y. I have used
`outliers<- boxplot(df$X plot=F)$out`
to identify my outliers from the 2 columns. I have also applied a for loop which calculates this for each species. To remove them from the dataset I am using
df2<- df[-which(df$X %in% outliers),]
The problem arises when there are no outliers identified or cannot be calculated due to low sample size. in such case the ouliers is empty and so the df2 is returned as an empty dataframe. Could someone please help me understand how else can I achieve this?
Upvotes: 0
Views: 205
Reputation: 389225
Negate the %in%
value to remove the outliers.
df2<- df[!df$X %in% outliers,]
For example, with mtcars
dataset -
mtcars[!mtcars$cyl %in% c(4, 6), ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
With this approach when the value is absent it returns all the rows.
mtcars[!mtcars$cyl %in% 5, ]
Upvotes: 1