Reputation: 311
I would like to use dbplyr
/dplyr
to create a group-level aggregate variable. Here is the gist of the command:
q = tbl_copy %>%
group_by(group_var) %>%
mutate(x_agg = min(x))
But this produces the following error:
Error: Window function `min()` is not supported by this database
Oddly, when I use the summarise()
verb, min()
works just fine.
q = tbl_copy %>%
group_by(group_var) %>%
summarise(x_agg = min(x))
q %>% show_query()
<SQL>
SELECT `group_var`, `x`, MIN(`x`) AS `x_agg`
FROM `my_table`
GROUP BY `group_var`
What am I missing? How do I use group_by()
with mutate()
on a copy of a MySQL table?
UPDATE: A REPRODUCIBLE EXAMPLE
> con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
> copy_to(con, mtcars)
>
> mtcars2 <- tbl(con, "mtcars")
> mtcars2 %>%
+ select(mpg,cyl) %>%
+ group_by(cyl) %>%
+ mutate(mpg_min = min(mpg))
Error: Window function `min()` is not supported by this database
Upvotes: 3
Views: 868
Reputation: 5704
MySQL does not support window functions, that's why dbplyr cannot translate your dplyr code into SQL.
When you work with a MySQL DB, the usual workaround is to use nested SQL queries such as this one:
select yt.*, t.x_agg
from yourtable yt inner join (select group_var, min(x) as x_agg
from yt
group by group_var) t
on yt.group_var = t.group_var
You can either send the above query directly to MySQL with dbGetQuery
or transpose this strategy into dplyr code:
tbl_copy %>%
inner_join(tbl_copy %>%
group_by(group_var) %>%
summarise(x_agg = min(x)), by = "group_var")
Also note that if your table is small enough, a more direct way is to do everything in memory (I mean: on client side, i.e. in R).
Upvotes: 3