Reputation: 3590
I am new to the purrr
package, but I like the little I know about it.
Using only tidyverse packages, I would like to be able to add a column that is the result of a function applied to a subset of columns in a dataset.
Here is some toy data. A series of columns of factors
df <- data.frame(a_1 = factor(rep(letters[1:3], times = 5)),
a_2 = factor(rep(letters[1:3], times = 5)),
a_3 = factor(rep(letters[1:3], times = 5)),
b_1 = factor(rep(letters[1:3], times = 5)),
b_2 = factor(rep(letters[1:3], times = 5)),
b_3 = factor(rep(letters[1:3], times = 5)))
df
# output
# a_1 a_2 a_3 b_1 b_2 b_3
# 1 a a a a a a
# 2 b b b b b b
# 3 c c c c c c
# 4 a a a a a a
# 5 b b b b b b
# 6 c c c c c c
# 7 a a a a a a
# 8 b b b b b b
# 9 c c c c c c
# 10 a a a a a a
# 11 b b b b b b
# 12 c c c c c c
# 13 a a a a a a
# 14 b b b b b b
# 15 c c c c c c
The following function, via purr::map_df
and dplyr::select
cycles through the columns of df that start with a_
, converts them to numeric class, finds the mean of those columns, then multiplies by 3.
rowMeans(purrr::map_df(.x = df %>% dplyr::select(grep("a_", names(.))),
.f = function(x) x <- as.numeric(x))*3)
# output
# [1] 3 6 9 3 6 9 3 6 9 3 6 9 3 6 9
This is the correct output, but is a vector.
Using a tidyverse function how do I add the result of my function to the existing df
dataset as a new column, instead of as a vector?
Something involving dplyr::mutate
I assume, but I can't work it out.
Upvotes: 2
Views: 1288
Reputation: 389255
You could use pmap_dbl
:
library(dplyr)
library(purrr)
df %>%
mutate(mean_vec = pmap_dbl(select(., starts_with('a_')),
~mean(as.numeric(c(...)) * 3)))
# a_1 a_2 a_3 b_1 b_2 b_3 mean_vec
#1 1 1 1 a a a 3
#2 2 2 2 b b b 6
#3 3 3 3 c c c 9
#4 1 1 1 a a a 3
#5 2 2 2 b b b 6
#6 3 3 3 c c c 9
#7 1 1 1 a a a 3
#8 2 2 2 b b b 6
#9 3 3 3 c c c 9
#10 1 1 1 a a a 3
#11 2 2 2 b b b 6
#12 3 3 3 c c c 9
#13 1 1 1 a a a 3
#14 2 2 2 b b b 6
#15 3 3 3 c c c 9
Or another option :
df %>%
mutate_at(vars(starts_with('a')), as.numeric) %>%
mutate(mean_vec = rowMeans(select(., starts_with('a_')) * 3))
Upvotes: 2