Reputation: 731
I had a script that had used one dataset which worked fine, and I used it for another dataset and then it didn't work fine, seemingly for no reason. I found the 'bug' which involved condition-based indexing of a data frame which evaluated to -0, which then removed all rows of the data frame rather than none. Obviously to intentionally index by -0 doesn't make sense, and it would be easy to resolve in a script with an if statement. But still...it seemed like a strange behaviour and I wondered if there is a reason for it, because it doesn't seem like the logical thing to do in this case?
To give an example, to me test[-which(numbers > 8),]
should return the entire dataframe:
letters <- c("a","b","c","d","e","f","g")
numbers <- c(1,2,3,4,5,6,7)
test <- data.frame(letters,numbers)
> test
letters numbers
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
> test[which(numbers > 3),]
letters numbers
4 d 4
5 e 5
6 f 6
7 g 7
> test[-which(numbers > 3),]
letters numbers
1 a 1
2 b 2
3 c 3
> test[which(numbers > 8),]
[1] letters numbers
<0 rows> (or 0-length row.names)
> test[-which(numbers > 8),]
[1] letters numbers
<0 rows> (or 0-length row.names)
Upvotes: 1
Views: 57
Reputation: 887971
Using -
is buggy as it can fail when there are no cases
> which(numbers > 8)
integer(0)
> -which(numbers > 8)
integer(0)
Instead, if we want to get the reverse cases, use setdiff
with the sequence of rows or (the one mentioned in the comments with logical vectors)
test[setdiff(seq_len(nrow(test)), which(numbers > 8)),]
letters numbers
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
Upvotes: 1