Reputation: 3
In the dataset I am working on, there are 24 variables and all of them have the outliers of 99. So I need to remove 99 from all these variables. Is there a quick way I can do this? I can do this one by one by using:
education <- subset(ex1, ex1$education<99)
ex1
is my dataset, can I think I need to use data.frame
to do this?
Upvotes: 0
Views: 1186
Reputation: 1446
Are you really talking about outlier's or about a flag value of 99 that you want to remove? The latter would simply be:
ex1[ex1 == 99] <- NA
Upvotes: 0
Reputation: 2001
Try this
#assuming ex1 is a data.frame
#if you want to remove the 99s completely
ex.wo.outliers <-sapply(ex1, function(x) subset(x, x!=99))
#if you want to keep the 99s as NAs
ex.withsub <-sapply(ex1, function(x) ifelse(x == 99,NA,x)
the first will remove all rows with 99s in any of your variables the second will take care of all your variables and make them NA
I recommend the second, as this will preserve the dimensions of your data.frame. The second will result in different lengths for each variable, in case you have a row with some 99s and some valid values.
Upvotes: 1
Reputation: 78792
Definitely suggest using a data.frame
and if you want to remove all rows with 99
then you can do:
ex1 <- data.frame(
a = sample(90:99,100, replace=TRUE),
b = sample(90:99,100, replace=TRUE),
c = sample(90:99,100, replace=TRUE),
d = sample(90:99,100, replace=TRUE),
e = sample(90:99,100, replace=TRUE),
f = sample(90:99,100, replace=TRUE)
)
print(nrow(ex1))
ex1 <- ex1[complete.cases(sapply(ex1, function(val) ifelse(val == 99, NA, val))),]
print(nrow(ex1))
(The print()
's are just to show that there are a different # of rows)
otherwise, you should use @infominer's suggestion (which was literally just edited to do a simpler/alternate version of the remove).
Upvotes: 2