MatSchu
MatSchu

Reputation: 419

How can I dynamically build a string and pass it to dplyr's mutate() function in R?

I asked a similar question before (Link). The given answer works fine. However, it turns out, that it does not fully apply to my use case.

Please consider the following minimal working example:

library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)
library(stringr)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

copy_to(con, mtcars, "mtcars", temporary = FALSE)

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb) %>% 
    mutate(Q1=1, Q2=2, Q3=3, Q4=4) %>% 
    collect()

I am interested in dynamically building the string Q1=1, Q2=2, Q3=3, Q4=4 such that it could be Q1=1, Q2=2, ..., Qn = n.

One idea I had is to build the string like that:

n_par <- 4
str_c('Q', 1:n_par, ' = ', 1:n_par, collapse = ', ')

such that n_par could be any positive number. However, due to dplyr's non-standard evaluation, I cannot make it work like that. However, this is exactly what I need.

Can somebody help?

Upvotes: 2

Views: 783

Answers (3)

MatSchu
MatSchu

Reputation: 419

I recently read more about the topic and I found that the following code works quite nicely, causing dbplyr to write a cleaner SQL code.

# Libraries

library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)

# Example database

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

copy_to(con, mtcars, "mtcars", temporary = FALSE)

# Parameter for number of variables to be created

n <- 4 

# Variable list

var <- list() 
for(i in 1:n){
    j <- paste0("Q", i) 
    var[[j]] <- i
} 

# Query/computation

db <- tbl(con, "mtcars") %>% 
    select(carb) %>% 
    distinct(carb) %>% 
    arrange(carb) %>% 
    mutate(!!! var) %>% 
    show_query() %>% 
    collect()

The trick was to build a list with proper names and to put it into the mutate() function using !!!. Furthermore, I read that parsing and evaluating strings should be avoided, so I switched to lists.

Upvotes: 2

Simon.S.A.
Simon.S.A.

Reputation: 6941

Generating and evaluating the string

Q1 = 1, Q2 = 2, Q3 = 3, Q4 = 4 is not a string in the same way that "Q1 = 1, Q2 = 2, Q3 = 3, Q4 = 4" is a string. There are some R functions that will take a string object and evaluate it as code. For example:

> eval(parse(text="print('hello world')"))

#> [1] "hello world"

However, this may not play nicely inside dbplyr translation. If you manage to get something like this approach working it would be good to see it posted as an answer.

Using a loop

Instead of doing it as a single string, an alternative is to use a loop:

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb)

for(i in 1:n){
    var = paste0("Q",i)
    db <- db %>%
        mutate(!!sym(var) := i)
}

db <- collect(db)

The !!sym() is required to tell dplyr that you want the text argument treated as a variable. Lazy evaluation can give you odd results without it. The := assignment is required because the LHS needs to be evaluated.

This approach is roughly equivalent to one mutate statement for each variable (example below), but the dbplyr translation might not look as elegant as doing it all within a single mutate statement.

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb) %>%
    mutate(Q1 = 1) %>%
    mutate(Q2 = 2) %>%
    ...
    mutate(Qn = n) %>%
    collect()

Upvotes: 2

Bruno
Bruno

Reputation: 4150

Does this work in in your database?

library(tidyverse)

q_n <- function(n) {
  str_c('Q', 1:n, ' = ', 1:n, collapse = ', ')
}

create_n_string <- function(data,n = 5,string = "Q"){
  data %>% 
    mutate(new_col = str_flatten(1:n,collapse = "_")) %>% 
    separate(new_col,into = string %>% str_c(1:n),sep = "_")
} 

mtcars %>% 
select(carb) %>%
  distinct(carb) %>%
  arrange(carb) %>%
  create_n_string()
#>   carb Q1 Q2 Q3 Q4 Q5
#> 1    1  1  2  3  4  5
#> 2    2  1  2  3  4  5
#> 3    3  1  2  3  4  5
#> 4    4  1  2  3  4  5
#> 5    6  1  2  3  4  5
#> 6    8  1  2  3  4  5

Created on 2020-01-22 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions