AlexP
AlexP

Reputation: 147

Sum multiple columns that have specific name in columns

I would like to sum the values of Var1 and Var2 for each row and produce a new column titled Vars which gives the total of Var1 and Var2. I would then like to do the same for Col1 and Col2 and have a new column titled Cols which gives the sum of Col1 and Col2. How do I write the code for this? Thanks in advance.

         df             
         ID Var1    Var2    Col1    Col2
         1    34      22      34      24
         2     3      25      54      65
         3    87      68      14      78
         4    66      98      98     100
         5    55      13      77       2

Expected outcome would be the following:

          df                        
          ID    Var1    Var2    Col1    Col2    Vars    Cols
          1     34      22      34      24      56      58
          2     3       25      54      65      28      119
          3     87      68      14      78      155     92
          4     66      98      98      100     164     198
          5     55      13      77      2       68      79

Upvotes: 3

Views: 703

Answers (4)

GKi
GKi

Reputation: 39657

A solution summing up all columns witch have the same name and end with numbers using gsub in base:

tt <- paste0(gsub('[[:digit:]]+', '', names(df)[-1]),"s")
df <- cbind(df, sapply(unique(tt), function(x) {rowSums(df[grep(x, tt)+1])}))
df
#  ID Var1 Var2 Col1 Col2 Vars Cols
#1  1   34   22   34   24   56   58
#2  2    3   25   54   65   28  119
#3  3   87   68   14   78  155   92
#4  4   66   98   98  100  164  198
#5  5   55   13   77    2   68   79

Or an even more general solution:

idx <- grep('[[:digit:]]',  names(df))
tt <- paste0(gsub('[[:digit:]]+', '', names(df)[idx]),"s")
df <- cbind(df, sapply(unique(tt), function(x) {rowSums(df[idx[grep(x, tt)]])}))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388972

Another dplyr way is to use helper functions starts_with to select columns and then use rowSums to sum those columns.

library(dplyr)

df$Vars <- df %>%  select(starts_with("Var")) %>%  rowSums()
df$Cols <-  df %>%  select(starts_with("Col")) %>%  rowSums()

df
#  ID Var1 Var2 Col1 Col2 Vars Cols
#1  1   34   22   34   24   56   58
#2  2    3   25   54   65   28  119
#3  3   87   68   14   78  155   92
#4  4   66   98   98  100  164  198
#5  5   55   13   77    2   68   79

Upvotes: 1

neilfws
neilfws

Reputation: 33782

Assuming that column ID is irrelevant (no groups) and you are happy to specify column names (solution hard-coded, not generic).

A base R solution:

df$Vars <- rowSums(df1[, c("Var1", "Var2")])
df$Cols <- rowSums(df1[, c("Col1", "Col2")])

A tidyverse solution:

library(dplyr)
library(purrr)

df %>% mutate(Vars = map2_int(Var1, Var2, sum),
              Cols = map2_int(Col1, Col2, sum))

# or just
df %>% mutate(Vars = Var1 + Var2,
              Cols = Col1 + Col2)

Upvotes: 3

TobiO
TobiO

Reputation: 1381

There are many different ways to do this. With

library(dplyr)
df = df %>% #input dataframe
  group_by(ID) %>% #do it for every ID, so every row
  mutate( #add columns to the data frame
    Vars = Var1 + Var2, #do the calculation
    Cols = Col1 + Col2
   )

But there are many other ways, eg with apply-functions etc. I suggest to read about the tidyverse.

Upvotes: 2

Related Questions