Reputation: 502
I've stumbled upon weird data table i
behavior that returns a row with NA
s where I would expect an empty data table. See:
a = data.table(a = 1, d = NA)
a[!is.na(a) & d == "3"]
# a d
# 1: NA NA
I would expect an empty data table as a result here. Compare to:
a = data.table(a = c(1,2), d = c(NA,3))
a[!is.na(a) & d == "3"]
# a d
# 1: 2 3
This one does not produce an extra row with NA
values, though.
Is this a bug in data.table
or there's some logic underlying this behavior that someone could explain?
Upvotes: 3
Views: 179
Reputation: 118889
Thanks for the ping @SergiiZaskaleta. I forgot to update this question, but this has been fixed a while ago, with this commit.
From NEWS:
- Subsets using logical expressions in
i
never returns all-NA
rows. Edge caseDT[NA]
is now fixed, #1252. Thanks to @sergiizaskaleta.
Upvotes: 1
Reputation: 16121
Don't know if it's a bug or not, but it seems it has to do with the type of your variable d.
a = data.table(a = 1, d = NA)
str(a)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ d: logi NA
# - attr(*, ".internal.selfref")=<externalptr>
a[!is.na(a) & d == "3"] # this returns NAs
# a d
# 1: NA NA
a[!is.na(a) & !is.na(d)] # this returns nothing
# Empty data.table (0 rows) of 2 cols: a,d
This one also works:
a = data.table(a = 1, d = 4)
str(a)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ d: num 4
# - attr(*, ".internal.selfref")=<externalptr>
a[!is.na(a) & d == "3"]
# Empty data.table (0 rows) of 2 cols: a,d
Looks like if a variable is of logical type it can't be compared to another type and returns NAs. However, with the dplyr package it seems to work:
library(dplyr)
a = data.table(a = 1, d = NA)
a %>% filter(!is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d
The same with the subset command:
subset(a, !is.na(a) & d == "3")
# Empty data.table (0 rows) of 2 cols: a,d
Upvotes: 1