Reputation: 780
My goal is create new variables that are composed of differences, but iterating across multiple columns. In base, this is crazy easy:
iris[, 6:7] <- iris[, 1:2] - iris[, 3:4]
Is there a way to achieve this in dplyr, perhaps using mutate?
The following code subtracts the third column from the 1st and 2nd:
iris2 <- iris %>%
mutate_at(1:2, funs(diffs = . - Petal.Length))
but what if I want to subtract the 3rd from the 1st and the 2nd from the 4th?
I'm working with rather big data applications, so why not assume I'm trying to this across a table of 1000 columns -- a manual hack is not preferable ...
Upvotes: 1
Views: 727
Reputation: 15072
Here's one way using dplyr::bind_cols
and purrr::map2
that appears to be significantly faster than base at large numbers of columns. I don't know enough about profiling to guess at why, since it feels a little more convoluted than the other options. I am not sure that it is easy to do this with mutate_
verbs, though open to correction.
EDIT: added an option with dplyr::do
, which is the "intended" way of doing computation that doesn't fit neatly inside a mutate
function. The problem with mutate
is that it expects to create exactly one column. I think other than using a map to construct individual mutate calls, which I cannot imagine will be faster, this is the best option.
library(tidyverse)
set.seed(4321)
df <- matrix(rnorm(1000000), ncol = 1000) %>%
as_tibble()
microbenchmark::microbenchmark(
base = df[, 1001:1500] <- df[, 1:500] - df[, 501:1000],
base2 = df %>% magrittr::inset(, 1001:1500, .[, 1:500] - .[, 501:1000]),
map = df %>% bind_cols(map2(.x = .[, 1:500], .y = .[, 501:1000], .f = ~.x - .y)),
nomap = df %>% bind_cols(.[, 1:500] - .[, 501:1000]),
do = df %>% do(.[, 1:500] - .[, 501:1000])
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> base 32.928171 36.394238 39.362308 37.361149 39.454822 112.76356 100
#> base2 33.302556 35.500491 38.888530 37.433863 40.207799 84.08674 100
#> map 4.693637 5.139985 5.967655 5.468398 6.264793 12.20658 100
#> nomap 23.061348 25.016053 28.598282 26.973913 29.574478 79.97451 100
#> do 21.906042 23.460822 27.049262 25.135640 26.596373 80.01928 100
#> cld
#> c
#> c
#> a
#> b
#> b
Created on 2018-05-11 by the reprex package (v0.2.0).
Upvotes: 2