Reputation: 1966
I'm using dplyr to execute a Redshift query via the database connection src
. lag
works a little bit differently in Redshift (see https://github.com/tidyverse/dplyr/issues/962), so I'm wondering if it's possible to modify the query that's generated from the dplyr chain to remove the third parameter (NULL
) in LAG
. Example:
res <- tbl(src, 'table_name') %>%
group_by(groupid) %>%
filter(value != lag(value)) %>%
collect()
gives
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: Default
parameter not be supported for window function lag)
I can see the translated sql:
translated <- dbplyr::translate_sql(
tbl(src, 'table_name') %>%
group_by(groupid) %>%
filter(value != lag(value)) %>%
collect()
)
# <SQL> COLLECT(FILTER(GROUP_BY(TBL("src", 'table_name'), "groupid"), "value" != LAG("value", 1, NULL) OVER ()))
And I can modify it to remove the NULL
parameter, which I think will solve the problem:
sub("(LAG\\(.*), NULL), "\\1", translated)
# <SQL> COLLECT(FILTER(GROUP_BY(TBL("src", 'table_name'), "groupid"), "value" != LAG("value", 1) OVER ()))
How can I execute this modified query?
Upvotes: 2
Views: 417
Reputation: 675
you should be able to useDBI::dbGetQuery(con, sub("(LAG\\(.*), NULL), "\\1", translated))
to run the new query.
Upvotes: 1