ericgtaylor
ericgtaylor

Reputation: 311

dbplyr window function for grouped mutate operations

I would like to use dbplyr/dplyr to create a group-level aggregate variable. Here is the gist of the command:

q = tbl_copy %>% 
  group_by(group_var) %>%
  mutate(x_agg = min(x))

But this produces the following error: Error: Window function `min()` is not supported by this database

Oddly, when I use the summarise() verb, min() works just fine.

q = tbl_copy %>% 
  group_by(group_var) %>%
  summarise(x_agg = min(x))

q %>% show_query()
<SQL>
SELECT `group_var`, `x`, MIN(`x`) AS `x_agg`
FROM `my_table`
GROUP BY `group_var`

What am I missing? How do I use group_by() with mutate() on a copy of a MySQL table?

UPDATE: A REPRODUCIBLE EXAMPLE

> con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
> copy_to(con, mtcars)
> 
> mtcars2 <- tbl(con, "mtcars")
> mtcars2 %>%
+     select(mpg,cyl) %>%
+     group_by(cyl) %>%
+     mutate(mpg_min = min(mpg))
Error: Window function `min()` is not supported by this database

Upvotes: 3

Views: 868

Answers (1)

Scarabee
Scarabee

Reputation: 5704

MySQL does not support window functions, that's why dbplyr cannot translate your dplyr code into SQL.

When you work with a MySQL DB, the usual workaround is to use nested SQL queries such as this one:

select yt.*, t.x_agg
from yourtable yt inner join (select group_var, min(x) as x_agg 
                              from yt 
                              group by group_var) t 
                    on yt.group_var = t.group_var

You can either send the above query directly to MySQL with dbGetQuery or transpose this strategy into dplyr code:

tbl_copy %>% 
  inner_join(tbl_copy %>% 
               group_by(group_var) %>%
               summarise(x_agg = min(x)), by = "group_var")

Also note that if your table is small enough, a more direct way is to do everything in memory (I mean: on client side, i.e. in R).

Upvotes: 3

Related Questions