Jake Fisher
Jake Fisher

Reputation: 3310

How can I get full_join in dplyr to preserve data.tables?

I'm merging two data.tables using dplyr's full_join like so:

library(data.table)
library(dplyr)

set.seed(90088)
dt1 <- data.table(id = 1:10, var1 = sample(20:30, 10, replace = T), key = "id")
dt2 <- data.table(id = 1:10, var2 = sample(40:50, 10, replace = T), key = "id")

both <- full_join(dt1, dt2)

But the outcome is a data.frame, not a data.table.

class(both)
# [1] "data.frame"

I'd like to be able to take advantage of the speed of data.tables later in my code (ideally using dplyr). Is there some option in full_join to preserve data.tables, or do I have to merge using the data.table syntax?

Upvotes: 1

Views: 579

Answers (1)

Steph Locke
Steph Locke

Reputation: 6146

Looking at the latest dplyr docs (currently v0.4.1), the underlying join methods for data.table (join.tbl_dt) do not yet support the full_join(), unlike the data.frame methods (join.tbl_df).

My searches on the dplyr github suggest there is not currently an outstanding feature request for this. My suggestion is therefore to raise a request if you'd like to see it implemented, and use merge in the interim.

Upvotes: 3

Related Questions