asachet
asachet

Reputation: 6921

Generate CROSS JOIN queries with dbplyr

Given 2 remote tables (simulated with tbl_lazy for this example)

library("dplyr")
library("dbplyr")

t1 <- tbl_lazy(df = iris, src = dbplyr::simulate_mysql())
t2 <- tbl_lazy(df = mtcars, src = dbplyr::simulate_mysql())

How can I perform an actual* cross join between t1 and t2 using R and dbplyr?

* i.e. using CROSS JOIN in the translated SQL query

Note that I know how to perform all the other types of joins, this is precisely about CROSS joins.

I am aware of the following trick:

joined <- t1 %>%
  mutate(tmp = 1) %>%
  full_join(mutate(t2, tmp = 1), by = "tmp") %>%
  select(-tmp)

However

  1. This is ugly (even if it could be hidden in a function)
  2. I would like to take advantage of the highly optimised join capabilities of the DB, so I'd like to pass a real SQL CROSS JOIN. Using show_query(joined) shows that the generated SQL query uses LEFT JOIN.

Sadly, there is no cross_join operator in dplyr and sql_join(t1, t2, type = "cross") does not work either (not implemented for tbls, works only on DB connections).

How can I generate an SQL CROSS JOIN with dbplyr?

Upvotes: 6

Views: 538

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

According to the dbplyr NEWS file, since version 1.10, if you use a full_join(..., by = character()), it will "promote" the join to a cross join. This doesn't seem to be documented anywhere else yet, but searching the dbplyr Github repo for "cross" turned it up in both code and the NEWS file.

This syntax does not seem to work for local data frames, only via SQL.

Upvotes: 5

Related Questions