Luke
Luke

Reputation: 4929

Remove values from a dataset based on a vector of those values

I have a dataset that looks like this, except it's much longer and with many more values:

dataset <- data.frame(grps = c("a","b","c","a","d","b","c","a","d","b","c","a"), response = c(1,4,2,6,4,7,8,9,4,5,0,3))

In R, I would like to remove all rows containing the values "b" or "c" using a vector of values to remove, i.e.

remove<-c("b","c")

The actual dataset is very long with many hundreds of values to remove, so removing values one-by-one would be very time consuming.

Upvotes: 1

Views: 2789

Answers (2)

Matt Parker
Matt Parker

Reputation: 27359

There's also subset:

subset(dataset, !(grps %in% remove))

... which is really just a wrapper around [ that lets you skip writing dataset$ over and over when there are multiple subset criteria. But, as the help page warns:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.

I've never had any problems, but the majority of my R code is scripting for my own use with relatively static inputs.


2013-04-12

I have now had problems. If you're building a package for CRAN, R CMD check will throw a NOTE if you have use subset in this way in your code - it will wonder if grps is a global variable, even though subset is evaluating it within dataset's environment (not the global one). So if there's any possiblity your code will end up in a package and you feel squeamish about NOTEs, stick with Rcoster's method.

Upvotes: 1

Rcoster
Rcoster

Reputation: 3210

Try:

dataset[!(dataset$grps %in% remove),]

Upvotes: 5

Related Questions