Reputation: 4929
I have a dataset that looks like this, except it's much longer and with many more values:
dataset <- data.frame(grps = c("a","b","c","a","d","b","c","a","d","b","c","a"), response = c(1,4,2,6,4,7,8,9,4,5,0,3))
In R, I would like to remove all rows containing the values "b" or "c" using a vector of values to remove, i.e.
remove<-c("b","c")
The actual dataset is very long with many hundreds of values to remove, so removing values one-by-one would be very time consuming.
Upvotes: 1
Views: 2789
Reputation: 27359
There's also subset
:
subset(dataset, !(grps %in% remove))
... which is really just a wrapper around [
that lets you skip writing dataset$
over and over when there are multiple subset criteria. But, as the help page warns:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.
I've never had any problems, but the majority of my R code is scripting for my own use with relatively static inputs.
2013-04-12
I have now had problems. If you're building a package for CRAN, R CMD check
will throw a NOTE if you have use subset
in this way in your code - it will wonder if grps
is a global variable, even though subset
is evaluating it within dataset
's environment (not the global one). So if there's any possiblity your code will end up in a package and you feel squeamish about NOTEs, stick with Rcoster's method.
Upvotes: 1