Reputation: 419
I am new to Stackoverflow and quite new to R. I would really appreciate your help.
I am using dplyr
's mutate()
function to create a set new columns based on one initial column. For an a priori known number of columns to be created, everything works fine.
However, in my application, the number of new columns to be created is unknown (or rather determined as input parameter before running the code).
For illustration, consider the following minimal working example:
library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars", temporary = FALSE)
db <- tbl(con, "mtcars") %>%
select(carb) %>%
distinct(carb) %>%
arrange(carb) %>%
mutate(carb1 = carb + 1) %>%
mutate(carb2 = carb + 2) %>%
mutate(carb3 = carb + 3) %>%
show_query() %>%
collect()
In this example, I create three new variables. However, I want the program to work with a dynamic number of variables (e.g., five or ten new variables). I also would like to do all of the calculations before collect()
, because I want to copy the data into memory as late as possible.
Some background for my real life application: I want to use the DB2's function ADD_MONTHS(). So I need dplyr
/dbplyr
to flush that function directly into an SQL command. I therefore need a solution that actually does not use data frame logic - I need the solution to be in dplyr
.
From a different perspective: In SAS I'd use the macro processor to dynamically build a proc sql statement. Is there an equivalent in R?
Upvotes: 2
Views: 643
Reputation: 887028
We can use map
library(dplyr)
library(purrr)
library(stringr)
map_dfc(1:3, ~ df %>%
transmute(!! str_c('x', .x) := x + .x)) %>%
bind_cols(df, .)
# x x1 x2 x3
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
In the case of database, do the collect
before adding the columns
dat <- tbl(con, "mtcars") %>%
select(carb) %>%
distinct(carb) %>%
arrange(carb) %>%
collect()
map_dfc(dat$carb, ~ dat %>%
transmute(!! str_c('carb', .x) := carb + .x)) %>%
bind_cols(dat, .)
# A tibble: 6 x 7
# carb carb1 carb2 carb3 carb4 carb6 carb8
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 2 3 4 5 7 9
#2 2 3 4 5 6 8 10
#3 3 4 5 6 7 9 11
#4 4 5 6 7 8 10 12
#5 6 7 8 9 10 12 14
#6 8 9 10 11 12 14 16
Or another option if we want to do this before collect
ing is to pass an expression in mutate
tbl(con, "mtcars") %>%
select(carb) %>%
distinct(carb) %>%
arrange(carb) %>%
mutate(!!! rlang::parse_exprs(str_c('carb', 1:3, sep="+", collapse=";"))) %>%
rename_at(-1, ~ str_c('carb', 1:3)) %>%
show_query() %>%
collect()
#<SQL>
#SELECT `carb`, `carb` + 1.0 AS `carb1`, `carb` + 2.0 AS `carb2`, `carb` + 3.0 AS #`carb3`
#FROM (SELECT *
#FROM (SELECT DISTINCT *
#FROM (SELECT `carb`
#FROM `mtcars`))
#ORDER BY `carb`)
# A tibble: 6 x 4
# carb carb1 carb2 carb3
# <dbl> <dbl> <dbl> <dbl>
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
#4 4 5 6 7
#5 6 7 8 9
#6 8 9 10 11
Upvotes: 3
Reputation: 388862
We can use map2_dfc
from purrr
pass the values to add and add data to original df
.
library(dplyr)
library(purrr)
bind_cols(df, map2_dfc(1:3, df ,`+`))
# x V1 V2 V3
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
Upvotes: 0