Reputation: 6931
I am using dbplyr to write and run queries in SQL Server, and want to apply a conditioned mutate. This can be done using ifelse
or using case_when
. The query works when using ifelse
but throws and exception when using case_when
.
The issue appears to be the SQL syntax that the two commands get translated into. The case_when
syntax does not appear to be valid SQL. Can you tell me why and how to fix it? Or is this a bug?
# libraries
library(DBI)
library(dplyr)
library(dbplyr)
# establish connection to database table
connection_string = "database.specific.string"
# mine looks something like "DRIVER=...; Trusted_Connection=...; DATABASE=...' SERVER=..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
my_table = tbl(db_connection, from = my_table_name)
# attempted query
tmp = my_table %>%
mutate(new_col = case_when(col1 == col2 ~ "a",
col1 != col2 ~ "b"))
# check SQL code for query
show_query(tmp)
The resulting SQL query is:
SELECT
col1, col2,
CASE
WHEN CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) THEN ('a')
WHEN CONVERT(BIT, IIF(col1 <> col2, 1.0, 0.0))) THEN ('b')
END AS new_col
FROM my_database.my_table_name
Running this code throws an error
An expression of non-boolean type specified in a context where a condition is expected, near 'THEN'
However the ifelse
query works as expected:
# attempted query
tmp = my_table %>%
mutate(new_col = ifelse(col1 == col2, "a", "b"))
# check SQL code for query
show_query(tmp)
The resulting SQL query is:
SELECT
col1, col2,
CASE
WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = TRUE) THEN ('a')
WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = FALSE) THEN ('b')
END AS new_col
FROM my_database.my_table_name
Note that in both cases the SQL syntax have been produced using show_query
. Using translate_sql
to produce the SQL code consistently produced much cleaner SQL syntax, but this is not the syntax that gets run on the server.
Does anyone else get these SQL queries? Any suggestions as to what is wrong and how to fix this?
Posted as an issue on the tidyverse and was informed that a solution has already been developed for case_when(..., TRUE ~ "b")
being translated to ELSE 'b'
(here).
However, as this does not address the syntax that is causing this exception. Question edited to focus on syntax that is the cause.
Posted as an issue on dbplyr. A response from Christophe Dervieux (cderv) identifies that the cause appears to be that SQL server needs a special translation for case_when
like it has for ifelse
.
In the mean time, users can use multiple ifelse
or if_else
statements.
Upvotes: 5
Views: 1670
Reputation: 6931
See Update 2 above:
This appears to be an issue effecting case_when
for SQL server.
The work-around at present is to use multiple ifelse
or if_else
statements:
data %>%
mutate(new_col = ifelse(condition1, val1, NA)) %>%
mutate(new_col = ifelse(is.na(new_col) & condition2, val2, new_col)) %>%
mutate(new_col = ifelse(is.na(new_col) & condition3, val3, new_col))
# etc
Upvotes: 0
Reputation: 1973
Is it just that you have got your dplyr syntax slightly wrong?
Try this
# attempted query
tmp = my_table %>%
mutate(new_col = case_when(col1 == col2 ~ "a",
col1 == 'TRUE' ~ "b"
# alternatively col1 == 1 ~ "b"
))
Upvotes: 0