Simon.S.A.
Simon.S.A.

Reputation: 6931

Combining dbplyr and case_when in SQL Server

I am using dbplyr to write and run queries in SQL Server, and want to apply a conditioned mutate. This can be done using ifelse or using case_when. The query works when using ifelse but throws and exception when using case_when.

The issue appears to be the SQL syntax that the two commands get translated into. The case_when syntax does not appear to be valid SQL. Can you tell me why and how to fix it? Or is this a bug?

# libraries
library(DBI)
library(dplyr)
library(dbplyr)

# establish connection to database table
connection_string = "database.specific.string"
# mine looks something like "DRIVER=...; Trusted_Connection=...; DATABASE=...' SERVER=..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
my_table = tbl(db_connection, from = my_table_name)

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 != col2 ~ "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

SELECT 
    col1, col2,
    CASE
       WHEN CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) THEN ('a')
       WHEN CONVERT(BIT, IIF(col1 <> col2, 1.0, 0.0))) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Running this code throws an error

An expression of non-boolean type specified in a context where a condition is expected, near 'THEN'

However the ifelse query works as expected:

# attempted query
tmp = my_table %>%
    mutate(new_col = ifelse(col1 == col2, "a", "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

SELECT 
    col1, col2,
    CASE
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = TRUE) THEN ('a')
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = FALSE) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Note that in both cases the SQL syntax have been produced using show_query. Using translate_sql to produce the SQL code consistently produced much cleaner SQL syntax, but this is not the syntax that gets run on the server.

Does anyone else get these SQL queries? Any suggestions as to what is wrong and how to fix this?

Updated

Posted as an issue on the tidyverse and was informed that a solution has already been developed for case_when(..., TRUE ~ "b") being translated to ELSE 'b' (here).

However, as this does not address the syntax that is causing this exception. Question edited to focus on syntax that is the cause.

Update 2

Posted as an issue on dbplyr. A response from Christophe Dervieux (cderv) identifies that the cause appears to be that SQL server needs a special translation for case_when like it has for ifelse.

In the mean time, users can use multiple ifelse or if_else statements.

Upvotes: 5

Views: 1670

Answers (2)

Simon.S.A.
Simon.S.A.

Reputation: 6931

See Update 2 above:

This appears to be an issue effecting case_when for SQL server.

The work-around at present is to use multiple ifelse or if_else statements:

data %>%
    mutate(new_col = ifelse(condition1, val1, NA)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition2, val2, new_col)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition3, val3, new_col))
    # etc

Upvotes: 0

Shinobi_Atobe
Shinobi_Atobe

Reputation: 1973

Is it just that you have got your dplyr syntax slightly wrong?

Try this

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 == 'TRUE' ~ "b"
              # alternatively  col1 == 1 ~ "b"
)) 

Upvotes: 0

Related Questions