Display name
Display name

Reputation: 4481

Applying regular expressions to SQL databases in R

tbl(db_name, "table_name")
tbl(db_name, "table_name") %>% show_query()
#> <SQL>
#> SELECT *
#> FROM "table_name"

I can use the dplyr tbl() function on a database, and table without issue. When I invoke the show_query() function I can see that tbl() is performing a simple SQL query in the background. This is shown above.

What's shown below is the error I get when I try and apply a regular expression and mutate a column utilizing dplyr and R (maybe dbplyr too?). FYI this is being performed in a Teradata data warehouse.

tbl(db_name, "table_name") %>% 
  mutate(col1 = gsub("^0+(?!$)", "", col1))
#> Error in new_result(connection@ptr, statement) : 
#>  nanodbc/nanodbc.cpp:1000: 22800: [Teradata][ODBC Teradata Driver]
#>  [Teradata Database](-8607)Syntax error: expected something between '(' and 
#>  the string '^0+(?!$)'.

If I show_query() you can see how the gsub() is applied, which I'm guessing is improper syntax for SQL.

tbl(db_name, "table_name") %>% 
  mutate(col1 = gsub("^0+(?!$)", "", col1)) %>% 
  show_query()
#> <SQL>
#> SELECT "Plant_Material_Id", 
#>        "Plant_Id_In_Source", 
#>        gsub('^0+(?!$)', '', "col1") AS "col1", 
#>                                        "col2",
#>                                        "col3"        
#>                                              
#> FROM "table_name"

How can I mutate a column utilizing regular expressions in R, on a SQL database (specifically Teradata in this case)?

I should also mention I want to do this without utilizing the collect() function, because I want everything done on the server side. I'm working with (relatively) large databases and loading everything into memory isn't that practical.

Upvotes: 3

Views: 1412

Answers (1)

Yuriy Barvinchenko
Yuriy Barvinchenko

Reputation: 1595

You can try to use SQL function REGEXP_REPLACE() instead of r function gsub()

tbl(db_name, "table_name") %>% 
  mutate(col1 = REGEXP_REPLACE(col1, "^0+(?!$)", "" ))

For PostgreSQL database just 'REPLACE`

Upvotes: 5

Related Questions