TOMC
TOMC

Reputation: 163

Create new empty columns in R with dplyr

Issue

I would like to create new empty columns with dplyr, based on a vector containing the new variables names. The columns I would like to create will contain only "0" for example. It is probably very easy but I can't find a solution. Ideally, i would use dplyr::mutates function (mutate, mutate_at...), because I need a solution that also works with spark dataframes. I would normally use mutate_at() but it only works if the columns already exist.

Note to those saying it's a duplicate

I can't use tibble::add_column(). I unfortunately need a solution that also works with spark dataframes

Reproducible example

library("dplyr")

# --- trying to create 3 new columns
# I would like to do something like this : 
new_vars = c("var1", "var2", "var3")
mtcars %>%
  mutate_at(.vars = new_vars, .funs = ~ 0)
but it generates the error :
Error in `tbl_at_vars()`:
! Can't subset columns that don't exist.
✖ Column `var1` doesn't exist.

of course, it works fine if the columns already exist

mtcars %>% 
  mutate_at(.vars = c('mpg', 'cyl'), .funs = ~ 0)

obvious data.frame solution but I need a solution that also works with spark dataframes

new_vars = c("var1", "var2", "var3")
mtcars[, new_vars]  = 0

obvious data.table solution but I need a solution that also works with spark dataframes

library("data.table")
new_vars = c("var1", "var2", "var3")
mtcars_dt = as.data.table(mtcars)
mtcars_dt[, (new_vars) := 0]

Thank you.

Upvotes: 0

Views: 616

Answers (1)

r2evans
r2evans

Reputation: 160387

I don't have spark available, but does an arrow connection closely-enough approximate what is needed?

arr <- arrow::arrow_table(mtcars)
Reduce(function(dat, this) mutate(dat, {{this}} := 0), new_vars, init=arr) %>%
  collect()
# # A tibble: 32 × 14
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  var1  var2  var3
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     0     0     0
#  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     0     0     0
#  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     0     0     0
#  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     0     0     0
#  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     0     0     0
#  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     0     0     0
#  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     0     0     0
#  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     0     0     0
#  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     0     0     0
# 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     0     0     0
# # ℹ 22 more rows
# # ℹ Use `print(n = ...)` to see more rows

Upvotes: 1

Related Questions