R - Sum Values That Match A Pattern/Character in Several Rows Across Several Columns

Question

I am looking to sum the values within each of the 'M' columns that have the same character in any of 'Y' columns, so if my data frame looks like this:

X     M.1    M.2    M.3    Y.1     Y.2     Y.3
K3    21     6      11     L       N       X   
K8    31     1      29     N                         
K2    8      0      2      L       Q       Z

I would like to get this output data frame:

Y     M.1    M.2    M.3
L     29     6      13
N     52     7      40
Q      8     0      2
X     21     6      11

Bonus if it can include all the values in column X that include the specific character in the 'Y' column into one column, looking like this:

Y     M.1    M.2    M.3    X.all
L     29     6      13     K3,K2
N     52     7      40     K3,K8
Q      8     0      2      K2
X     29     6      13     K3

So far, using aggregate() function below I can get the sum of each value in the "Y' column individually, but appreciate a better way to make an entirely new data frame with all the sums together

aggregate(cbind(df$M.1) ~ df$Y.1, data = df, sum)

Thanks so much for help with this!

camille · Accepted Answer

If you want to use tidyverse functions, you can do some wrangling in a few steps. I'm breaking it down to see the intermediary results.

About missing values: that's up to you. You didn't dput data, so when I read in your data as text with readr::read_table2, the blanks are automatically converted to NA. Here I'm keeping those missing values.

So first, tidyr::gather gets you a long-shaped data frame, first with Y.1, etc in a single column:

library(dplyr)
library(tidyr)

df %>%
  gather(key, value = Y, Y.1:Y.3) %>%
  head()
#> # A tibble: 6 x 6
#>   X       M.1   M.2   M.3 key   Y    
#>        
#> 1 K3       21     6    11 Y.1   L    
#> 2 K8       31     1    29 Y.1   N    
#> 3 K2        8     0     2 Y.1   L    
#> 4 K3       21     6    11 Y.2   N    
#> 5 K8       31     1    29 Y.2    
#> 6 K2        8     0     2 Y.2   Q

A second gather puts Ys and Ms into 2 columns:

df %>%
  gather(key, value = Y, Y.1:Y.3) %>%
  gather(key2, value = M, M.1:M.3) %>%
  head()
#> # A tibble: 6 x 5
#>   X     key   Y     key2      M
#>       
#> 1 K3    Y.1   L     M.1      21
#> 2 K8    Y.1   N     M.1      31
#> 3 K2    Y.1   L     M.1       8
#> 4 K3    Y.2   N     M.1      21
#> 5 K8    Y.2     M.1      31
#> 6 K2    Y.2   Q     M.1       8

Then you can group, create a column with the pasted strings such as K2,K3, and add up the numeric values. I put x.all in the grouping so it wouldn't get dropped after summarizing.

df %>%
  gather(key, value = Y, Y.1:Y.3) %>%
  gather(key2, value = M, M.1:M.3) %>%
  group_by(Y) %>%
  mutate(x.all = sort(X) %>% unique() %>% paste(collapse = ",")) %>%
  group_by(Y, key2, x.all) %>%
  summarise(sum = sum(M, na.rm = T)) %>%
  head()
#> # A tibble: 6 x 4
#> # Groups:   Y, key2 [6]
#>   Y     key2  x.all   sum
#>      
#> 1 L     M.1   K2,K3    29
#> 2 L     M.2   K2,K3     6
#> 3 L     M.3   K2,K3    13
#> 4 N     M.1   K3,K8    52
#> 5 N     M.2   K3,K8     7
#> 6 N     M.3   K3,K8    40

Then bring it back into a wide shape with columns for the different M variables:

df %>%
  gather(key, value = Y, Y.1:Y.3) %>%
  gather(key2, value = M, M.1:M.3) %>%
  group_by(Y) %>%
  mutate(x.all = sort(X) %>% unique() %>% paste(collapse = ",")) %>%
  group_by(Y, key2, x.all) %>%
  summarise(sum = sum(M, na.rm = T)) %>%
  spread(key = key2, value = sum)
#> # A tibble: 6 x 5
#> # Groups:   Y [6]
#>   Y     x.all   M.1   M.2   M.3
#>       
#> 1 L     K2,K3    29     6    13
#> 2 N     K3,K8    52     7    40
#> 3 Q     K2        8     0     2
#> 4 X     K3       21     6    11
#> 5 Z     K2        8     0     2
#> 6   K8       62     2    58

^{Created on 2018-10-17 by the reprex package (v0.2.1)}

R - Sum Values That Match A Pattern/Character in Several Rows Across Several Columns

Answers (2)

Related Questions