Reputation: 49
I need to sum columns in a dataframe where the columns that need to be summed are defined in a separate data frame. Reproducible example below.
dataset <- tibble(L1 = runif(100, 0, 1),
L2 = runif(100, 0, 1),
L3 = runif(100, 0, 1),
L4 = runif(100, 0, 1))
cols_to_sum <- tibble(col1 = c("L1","L2"),
col2 = c("L3","L4"))
In the example above I need to create two additional columns in dataset, one called "L1L3" which is the sum of L1 and L3 and similar for L2 and L4. The desired output should look like the dataframe below. The cols_to_sum dataframe could have any number of rows and the dataset could have any number of columns.
dataset <- tibble(L1 = runif(100, 0, 1),
L2 = runif(100, 0, 1),
L3 = runif(100, 0, 1),
L4 = runif(100, 0, 1)) %>%
mutate(L1L3 = L1 + L3,
L2L4 = L2 + L4)
Upvotes: 1
Views: 131
Reputation: 5138
Here is one base R solution which combines the columns you want to sum for the column names, and uses subsetting and rowSums()
within lapply()
to add up your columns:
dataset[sapply(cols_to_sum, paste0, collapse = "")] <- lapply(cols_to_sum, function(x) rowSums(dataset[x]))
dataset
# A tibble: 100 x 6
L1 L2 L3 L4 L1L2 L3L4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.915 0.626 0.885 0.484 1.54 1.37
2 0.937 0.217 0.517 0.445 1.15 0.962
3 0.286 0.217 0.852 0.0604 0.503 0.912
4 0.830 0.389 0.443 0.328 1.22 0.770
5 0.642 0.942 0.158 0.878 1.58 1.04
6 0.519 0.963 0.442 0.931 1.48 1.37
7 0.737 0.740 0.968 0.392 1.48 1.36
8 0.135 0.733 0.485 0.159 0.868 0.643
9 0.657 0.536 0.252 0.320 1.19 0.572
10 0.705 0.00227 0.260 0.307 0.707 0.567
Data:
set.seed(42)
dataset <- tibble(L1 = runif(100, 0, 1),
L2 = runif(100, 0, 1),
L3 = runif(100, 0, 1),
L4 = runif(100, 0, 1))
cols_to_sum <- tibble(col1 = c("L1","L2"),
col2 = c("L3","L4"))
Upvotes: 0
Reputation: 2829
More sequentally you can create a function to pass the character evaluation you want to evaluate, as in here. The code would be as follows:
library(tidyverse)
library(rlang)
library(dplyr)
library(tidyr)
# You create the function
example_fun <- function(df, new_var, expression) {
df %>%
mutate(!! new_var := !! parse_expr(expression))
}
example_fun(new_var, expression)
dataset <- tibble(L1 = runif(100, 0, 1),
L2 = runif(100, 0, 1),
L3 = runif(100, 0, 1),
L4 = runif(100, 0, 1))
#Transform it to dataframe
cols_to_sum <- tibble(col1 = c("L1","L2"),
col2 = c("L3","L4"))%>% as.data.frame()
# apply by column the rule of summing
for(i in 1:ncol(cols_to_sum)){
expressionsum <- paste(as.character(cols_to_sum[,i]), collapse = "+",sep ="")
Newvar <-paste(as.character(cols_to_sum[,i]), collapse = "")
dataset <- example_fun(dataset, Newvar, expressionsum)
}
dataset
# # A tibble: 100 x 6
# L1 L2 L3 L4 L1L2 L3L4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0.550 0.209 0.331 0.000826 0.759 0.332
# 2 0.503 0.587 0.918 0.0305 1.09 0.948
# 3 0.0269 0.223 0.310 0.539 0.250 0.850
# 4 0.622 0.0543 0.887 0.322 0.676 1.21
# 5 0.748 0.784 0.830 0.0694 1.53 0.899
# 6 0.374 0.416 0.688 0.520 0.791 1.21
# 7 0.524 0.603 0.884 0.0563 1.13 0.941
# 8 0.774 0.640 0.117 0.0622 1.41 0.180
# 9 0.954 0.868 0.809 0.429 1.82 1.24
# 10 0.606 0.833 0.310 0.894 1.44 1.20
# # … with 90 more rows
Upvotes: 0
Reputation: 39868
One option involving dplyr
and purrr
could be:
map_dfc(.x = asplit(cols_to_sum, 1), ~ dataset %>%
mutate(!!paste(paste(.x, collapse = "_"), "sum", sep = "_") := rowSums(select(., .x))) %>%
select(ends_with("sum"))) %>%
bind_cols(dataset)
L1_L3_sum L2_L4_sum L1 L2 L3 L4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.42 1.79 0.621 0.878 0.802 0.908
2 0.944 1.39 0.135 0.527 0.809 0.864
3 1.16 0.859 0.607 0.361 0.555 0.498
4 1.71 1.10 0.982 0.853 0.729 0.252
5 0.856 0.950 0.287 0.0234 0.568 0.927
6 0.235 1.16 0.00368 0.363 0.232 0.801
7 1.27 1.24 0.516 0.601 0.755 0.637
8 1.37 1.38 0.486 0.914 0.882 0.465
9 0.368 1.12 0.168 0.642 0.200 0.482
10 0.341 1.33 0.317 0.477 0.0240 0.857
Upvotes: 1