Roasty247
Roasty247

Reputation: 731

Unexpected result indexing data frame by -0 in R

I had a script that had used one dataset which worked fine, and I used it for another dataset and then it didn't work fine, seemingly for no reason. I found the 'bug' which involved condition-based indexing of a data frame which evaluated to -0, which then removed all rows of the data frame rather than none. Obviously to intentionally index by -0 doesn't make sense, and it would be easy to resolve in a script with an if statement. But still...it seemed like a strange behaviour and I wondered if there is a reason for it, because it doesn't seem like the logical thing to do in this case?

To give an example, to me test[-which(numbers > 8),] should return the entire dataframe:

letters <- c("a","b","c","d","e","f","g")
numbers <- c(1,2,3,4,5,6,7)

test <- data.frame(letters,numbers)

> test
  letters numbers
1       a       1
2       b       2
3       c       3
4       d       4
5       e       5
6       f       6
7       g       7

> test[which(numbers > 3),]
  letters numbers
4       d       4
5       e       5
6       f       6
7       g       7

> test[-which(numbers > 3),]
  letters numbers
1       a       1
2       b       2
3       c       3

> test[which(numbers > 8),]
[1] letters numbers
<0 rows> (or 0-length row.names)

> test[-which(numbers > 8),]
[1] letters numbers
<0 rows> (or 0-length row.names)

Upvotes: 1

Views: 57

Answers (1)

akrun
akrun

Reputation: 887971

Using - is buggy as it can fail when there are no cases

> which(numbers > 8)
integer(0)
> -which(numbers > 8)
integer(0)

Instead, if we want to get the reverse cases, use setdiff with the sequence of rows or (the one mentioned in the comments with logical vectors)

test[setdiff(seq_len(nrow(test)), which(numbers > 8)),]
  letters numbers
1       a       1
2       b       2
3       c       3
4       d       4
5       e       5
6       f       6
7       g       7

Upvotes: 1

Related Questions