Matt Mills
Matt Mills

Reputation: 608

Can't use dplyr joins on two date columns

I'm encountering an error where I try to join two data frames using the dplyr join functions by two Date columns. This is the error I get:

Error: cannot join on columns 'DateInfo' x 'DateInfo': Can't join on 'DateInfo' x 'DateInfo' because of incompatible types (Date / Date)

The base merge function works fine and I can't seem to find an example of what could be causing this through googling or other stack overflow questions.

The problem is I can't create a reproducible example and the data I am using I can't share. For example this works with no problems:

d1 <- data_frame(Num = 1:5, DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d2 <- data_frame(Name = c("a", "b", "c", "d", "e"), DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d3 <- left_join(d1, d2, by = c("DateInfo" = "DateInfo"))

Has anyone had any experience with not being able to join on two columns that are, as far as the class function is concerned, are the same type but still getting this error?

EDIT: Just to get this out of the way I can get around this error by using merge or converting the dates to characters and then joining, so I'm really just interested in why dplyr would tell me I can't merge on two columns with the same type.

Upvotes: 13

Views: 12297

Answers (2)

Paul
Paul

Reputation: 155

I just had this exact same issue. Two data frames, each with a POSIXct date_time column and the dplyr join functions (by = "date_time") would not work due to incompatible types. Thanks to Matt Mills, I used the dput function to investigate the POSIXct columns and found that, even though both were POSIXct, one came out numeric and the other was character.

I fixed this by going back to where I created my POSIXct object and used this code:

df_temp <- df_temp %>% 
mutate(date_time = as.numeric(date_time)) %>% 
mutate(date_time = as.POSIXct(date_time, tz = tz_in, origin = "1970-01-01 00:00:00"))

Its weird...its like the POSIXct format remembers its original type. My added code forced the date_time fields in both variables to be numeric before converting to POSIXct.

dplyr::inner_join now works. Thanks for this thread; saved my bacon. ;)

Upvotes: 1

Matt Mills
Matt Mills

Reputation: 608

The reason I can't merge is how the two Date objects are stored. Thanks to this issue I decided to check the structure of how the two objects are stored and sure enough one is stored as an integer and one is stored as a numeric:

> dput(df1$DateInfo[1])
structure(16373, class = "Date")
> dput(df2$DateInfo[1])
structure(16372L, class = "Date")

It appears that the data that was pulled form a DB through the dplyr sql functions is stored as a numeric while the data from a csv is stored as an integer. I don't know why that won't let dplyr join on them while merge can or why it happens in the first place but I think this specific question is answered.

Upvotes: 10

Related Questions