Reputation: 133
I'm looking to remove 7 rows from a large dataset (>400 rows), based on the values in a certain column. I am having issues with this simple endeavour.
##Generate sample dataset
Site.Num=c(1:20)
Year=c(1990:2009)
Day=c(10:29)
Final<-data.frame(Site.Num,Year,Day)
##I would like to remove 5 rows, based on 5 sites from the Site.Num column
Final <- Final[which(Final$Site.Num!=c(1,4,10,11,14)), ]
##I receive this error message
Warning message:
In Final$Site.Num != c(1, 4, 10, 11, 14) :
longer object length is not a multiple of shorter object length
Upvotes: 2
Views: 7782
Reputation: 42649
The warning is because you're using !=
to compare different vectors, and recycling will happen. However, this warning is important, because in this case, you're asking for a different value than you expect.
For example (using ==
for clarity) if you want to see which values of c(1,2,2)
are contained in c(1,2)
, consider this expression:
> c(1,2,2) == c(1,2)
[1] TRUE TRUE FALSE
Warning message:
In c(1, 2, 2) == c(1, 2) :
longer object length is not a multiple of shorter object length
but 2
is clearly in both vectors. The FALSE
value is because the vector on the right is being recycled, so these are the actual values compared:
> c(1,2,2) == c(1,2,1)
[1] TRUE TRUE FALSE
However, in the former case, the vector on the right is not recycled an integral number of times. This usually means that you did something that you didn't expect. You want the operator %in%
which gives set inclusion:
> c(1,2,2) %in% c(1,2)
[1] TRUE TRUE TRUE
No warning, and the expected answer.
For your question, here is the command to get the desired rows:
Final <- Final[!(Final$Site.Num %in% c(1,4,10,11,14)), ]
Note that which
doesn't help or hurt in this statement, unless the set of returned rows would be empty.
Upvotes: 4
Reputation: 23574
With the dplyr package, you can do something like this.
filter(Final, !Site.Num %in% c(1,4,10,11,14))
Upvotes: 1