fc9.30
fc9.30

Reputation: 2591

Unexpected behavior using both: unique and == function

I guess there is a bug in the unique-function of the data.table (1.9.6) package:

Small example:

test <- data.table(a = c("1", "1", "2", "2", "3", "4", "4", "4"), 
                   b = letters[1:8], 
                   d = c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE))

   a b     d
1: 1 a  TRUE
2: 1 b  TRUE
3: 2 c FALSE
4: 2 d FALSE
5: 3 e  TRUE
6: 4 f FALSE
7: 4 g FALSE
8: 4 h FALSE

test[d == TRUE, `:=` (b = "M")]
test <- unique(test, by = c("a", "b"))

   a b     d
1: 1 M  TRUE
2: 2 c FALSE
3: 2 d FALSE
4: 3 M  TRUE
5: 4 f FALSE
6: 4 g FALSE
7: 4 h FALSE

At this point everything is perfect but now I want to select only rows where column d is true:

test[d == TRUE]
   a b    d
1: 1 M TRUE

But the result is wrong.

Upvotes: 4

Views: 103

Answers (2)

jangorecki
jangorecki

Reputation: 16727

That bug was just fixed in development repository.

library(data.table)
test <- data.table(a = c("1", "1", "2", "2", "3", "4", "4", "4"), 
                   b = letters[1:8], 
                   d = c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE))
test[d == TRUE, `:=` (b = "M")]
test <- unique(test, by = c("a", "b"))
test[d == TRUE]
#   a b    d
#1: 1 M TRUE
#2: 3 M TRUE

Development version data.table was already published in drat repo and can be easily installed by:

install.packages("data.table", repos="https://Rdatatable.github.io/data.table", type="source")

Thanks for reporting!

Upvotes: 5

Choubi
Choubi

Reputation: 680

Without solving the bug, it does work with normal data.frame syntax:

test[test$d, ]

or

test[test$d == TRUE, ]

Upvotes: 0

Related Questions