Reputation: 8454

Sum a column based on the value of a cell in another column of the same row in R

I have this data frame:

names <- c("george","fred","bill","george",'fred',"bill")
val1  <- c(2,3,4,6,7,8)
val2  <- c(3,4,5,6,8,7)
ch    <- c("yes","no","yes","no","yes","no")
tot   <- data.frame(names,val1,val2,ch)


names val1 val2  ch
1 george    2    3 yes
2   fred    3    4  no
3   bill    4    5 yes
4 george    6    6  no
5   fred    7    8 yes
6   bill    8    7  no

And I want to sum the val1 and val2 for every names when the ch value is yes to have a new data frame like this:

names val1 val2
1 george    2    3
2   fred    7    8
3   bill    4    5

Upvotes: 0

Answers (3)

dc37

Reputation: 16178

Alternatively to the use of tidyverse package, you can use base r function aggregate such as:

aggregate(tot[ch == "yes",2:3], by = list(tot[ch=="yes","names"]), sum)

  Group.1 val1 val2
1    bill    4    5
2    fred    7    8
3  george    2    3

Thanks to @akrun's suggestion, we can use aggregate and its argument subset to avoid double subsetting:

aggregate(. ~ names, tot, FUN = sum, subset= c(ch == 'yes'))
# or
aggregate(.~names, subset(tot, ch == "yes"), sum)

   names val1 val2 ch
1   bill    4    5  2
2   fred    7    8  2
3 george    2    3  2

Upvotes: 2

Karolis Koncevičius

Reputation: 9656

This should be quite fast:

inds <- tot$ch=="yes"
rowsum(tot[inds, c("val1", "val2")], tot$names[inds])

       val1 val2
bill      4    5
fred      7    8
george    2    3

Upvotes: 2

akrun

Reputation: 887851

We can either do a group by 'names' and the do the == within summarise_at to get the sum of 'val' columns that corresponds to 'ch' as 'yes'

library(dplyr)
tot %>%
    group_by(names) %>%
    summarise_at(vars(starts_with('val')), ~ sum(.[ch == 'yes']))

Or filter the 'ch' first, but this could result in removing some 'names' that doesn't have the 'yes', so a complete at the end would be better

library(tidyr)
tot %>%
    filter(ch == 'yes') %>%
    group_by(names) %>%
    summarise_at(vars(starts_with('val')), sum) %>%
    complete(names = unique(tot$names))

Upvotes: 2

Sum a column based on the value of a cell in another column of the same row in R

Answers (3)

Related Questions