Snowflake
Snowflake

Reputation: 3081

Why does %in% compare the data type while, == can compare strings?

I have a rather basic question. Can someone explain to me why the former works, while the latter does not and why the Date data type matters?

library(data.table)

test.table <- data.table(Dates = 
                           as.Date(c("2020-08-31", "2020-01-31", "2020-08-31", "2010-01-01")))

test.table[Dates == "2020-08-31"]

test.table[Dates %in% c("2020-08-31")]

Upvotes: 0

Views: 42

Answers (2)

Roland
Roland

Reputation: 132706

This is not specific to data.table. The documentation in help("%in%) says this:

Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching.

The common type between a Date variable and a character variable is "character". Since the documentation refers to types and not to classes, as.character.Date is not involved. I assume the internal doubles of the Date variable are coerced and compared.

You should never rely on the automatic coercion for comparisons. Always use explicit coercion:

Dates %in% as.Date("2020-08-31")

Dates == as.Date("2020-08-31")

Upvotes: 3

LC-datascientist
LC-datascientist

Reputation: 2096

Regarding

test.table[Dates %in% c("2020-08-31")]

c("2020-08-31") is treated as a character class while test.table$Dates is a Date class. Therefore, they are not a match when using %in%.

If you convert the Dates as character or c("2020-08-31") as Date, you will get the same match.

test.table[as.character(Dates) %in% c("2020-08-31")]

test.table[Dates %in% as.Date("2020-08-31")]

Upvotes: 0

Related Questions