sirallen
sirallen

Reputation: 1966

Modify dplyr database query

I'm using dplyr to execute a Redshift query via the database connection src. lag works a little bit differently in Redshift (see https://github.com/tidyverse/dplyr/issues/962), so I'm wondering if it's possible to modify the query that's generated from the dplyr chain to remove the third parameter (NULL) in LAG. Example:

res <- tbl(src, 'table_name') %>% 
  group_by(groupid) %>%
  filter(value != lag(value)) %>%
  collect()

gives

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  Default
    parameter not be supported for window function lag)

I can see the translated sql:

translated <- dbplyr::translate_sql(
  tbl(src, 'table_name') %>% 
    group_by(groupid) %>%
    filter(value != lag(value)) %>%
    collect()
  )

# <SQL> COLLECT(FILTER(GROUP_BY(TBL("src", 'table_name'), "groupid"), "value" != LAG("value", 1, NULL) OVER ()))

And I can modify it to remove the NULL parameter, which I think will solve the problem:

sub("(LAG\\(.*), NULL), "\\1", translated)

# <SQL> COLLECT(FILTER(GROUP_BY(TBL("src", 'table_name'), "groupid"), "value" != LAG("value", 1) OVER ()))

How can I execute this modified query?

Upvotes: 2

Views: 417

Answers (1)

edgararuiz
edgararuiz

Reputation: 675

you should be able to useDBI::dbGetQuery(con, sub("(LAG\\(.*), NULL), "\\1", translated)) to run the new query.

Upvotes: 1

Related Questions