andrew
andrew

Reputation: 49

How to sum columns in R where the columns to sum are defined in a separate data frame

I need to sum columns in a dataframe where the columns that need to be summed are defined in a separate data frame. Reproducible example below.

dataset <- tibble(L1 = runif(100, 0, 1),
                  L2 = runif(100, 0, 1),
                  L3 = runif(100, 0, 1),
                  L4 = runif(100, 0, 1))


cols_to_sum <- tibble(col1 = c("L1","L2"),
                      col2 = c("L3","L4"))

In the example above I need to create two additional columns in dataset, one called "L1L3" which is the sum of L1 and L3 and similar for L2 and L4. The desired output should look like the dataframe below. The cols_to_sum dataframe could have any number of rows and the dataset could have any number of columns.

dataset <- tibble(L1 = runif(100, 0, 1),
                  L2 = runif(100, 0, 1),
                  L3 = runif(100, 0, 1),
                  L4 = runif(100, 0, 1)) %>%
  mutate(L1L3 = L1 + L3,
         L2L4 = L2 + L4)

Upvotes: 1

Views: 131

Answers (3)

Andrew
Andrew

Reputation: 5138

Here is one base R solution which combines the columns you want to sum for the column names, and uses subsetting and rowSums() within lapply() to add up your columns:

dataset[sapply(cols_to_sum, paste0, collapse = "")] <- lapply(cols_to_sum, function(x) rowSums(dataset[x]))

dataset
# A tibble: 100 x 6
      L1      L2    L3     L4  L1L2  L3L4
   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
 1 0.915 0.626   0.885 0.484  1.54  1.37 
 2 0.937 0.217   0.517 0.445  1.15  0.962
 3 0.286 0.217   0.852 0.0604 0.503 0.912
 4 0.830 0.389   0.443 0.328  1.22  0.770
 5 0.642 0.942   0.158 0.878  1.58  1.04 
 6 0.519 0.963   0.442 0.931  1.48  1.37 
 7 0.737 0.740   0.968 0.392  1.48  1.36 
 8 0.135 0.733   0.485 0.159  0.868 0.643
 9 0.657 0.536   0.252 0.320  1.19  0.572
10 0.705 0.00227 0.260 0.307  0.707 0.567

Data:

set.seed(42)

dataset <- tibble(L1 = runif(100, 0, 1),
                  L2 = runif(100, 0, 1),
                  L3 = runif(100, 0, 1),
                  L4 = runif(100, 0, 1))


cols_to_sum <- tibble(col1 = c("L1","L2"),
                      col2 = c("L3","L4"))

Upvotes: 0

Carles
Carles

Reputation: 2829

More sequentally you can create a function to pass the character evaluation you want to evaluate, as in here. The code would be as follows:

library(tidyverse)
library(rlang)
library(dplyr)
library(tidyr)

# You create the function
example_fun <- function(df, new_var, expression) {


  df %>%
    mutate(!! new_var := !! parse_expr(expression))
}
example_fun(new_var, expression)

dataset <- tibble(L1 = runif(100, 0, 1),
                  L2 = runif(100, 0, 1),
                  L3 = runif(100, 0, 1),
                  L4 = runif(100, 0, 1))

#Transform it to dataframe
cols_to_sum <- tibble(col1 = c("L1","L2"),
                      col2 = c("L3","L4"))%>% as.data.frame()

# apply by column the rule of summing
for(i in 1:ncol(cols_to_sum)){
  expressionsum <- paste(as.character(cols_to_sum[,i]), collapse =  "+",sep ="")
  Newvar <-paste(as.character(cols_to_sum[,i]), collapse =  "") 
  dataset <- example_fun(dataset, Newvar, expressionsum)

}

dataset
# # A tibble: 100 x 6
# L1     L2    L3       L4  L1L2  L3L4
# <dbl>  <dbl> <dbl>    <dbl> <dbl> <dbl>
#   1 0.550  0.209  0.331 0.000826 0.759 0.332
# 2 0.503  0.587  0.918 0.0305   1.09  0.948
# 3 0.0269 0.223  0.310 0.539    0.250 0.850
# 4 0.622  0.0543 0.887 0.322    0.676 1.21 
# 5 0.748  0.784  0.830 0.0694   1.53  0.899
# 6 0.374  0.416  0.688 0.520    0.791 1.21 
# 7 0.524  0.603  0.884 0.0563   1.13  0.941
# 8 0.774  0.640  0.117 0.0622   1.41  0.180
# 9 0.954  0.868  0.809 0.429    1.82  1.24 
# 10 0.606  0.833  0.310 0.894    1.44  1.20 
# # … with 90 more rows

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39868

One option involving dplyr and purrr could be:

map_dfc(.x = asplit(cols_to_sum, 1), ~ dataset %>%
         mutate(!!paste(paste(.x, collapse = "_"), "sum", sep = "_") := rowSums(select(., .x))) %>%
         select(ends_with("sum"))) %>%
 bind_cols(dataset)

   L1_L3_sum L2_L4_sum      L1     L2     L3    L4
       <dbl>     <dbl>   <dbl>  <dbl>  <dbl> <dbl>
 1     1.42      1.79  0.621   0.878  0.802  0.908
 2     0.944     1.39  0.135   0.527  0.809  0.864
 3     1.16      0.859 0.607   0.361  0.555  0.498
 4     1.71      1.10  0.982   0.853  0.729  0.252
 5     0.856     0.950 0.287   0.0234 0.568  0.927
 6     0.235     1.16  0.00368 0.363  0.232  0.801
 7     1.27      1.24  0.516   0.601  0.755  0.637
 8     1.37      1.38  0.486   0.914  0.882  0.465
 9     0.368     1.12  0.168   0.642  0.200  0.482
10     0.341     1.33  0.317   0.477  0.0240 0.857

Upvotes: 1

Related Questions