Reputation: 8454
I have this data frame:
names <- c("george","fred","bill","george",'fred',"bill")
val1 <- c(2,3,4,6,7,8)
val2 <- c(3,4,5,6,8,7)
ch <- c("yes","no","yes","no","yes","no")
tot <- data.frame(names,val1,val2,ch)
names val1 val2 ch
1 george 2 3 yes
2 fred 3 4 no
3 bill 4 5 yes
4 george 6 6 no
5 fred 7 8 yes
6 bill 8 7 no
And I want to sum the val1
and val2
for every names
when the ch
value is yes
to have a new data frame like this:
names val1 val2
1 george 2 3
2 fred 7 8
3 bill 4 5
Upvotes: 0
Views: 69
Reputation: 16178
Alternatively to the use of tidyverse
package, you can use base r
function aggregate
such as:
aggregate(tot[ch == "yes",2:3], by = list(tot[ch=="yes","names"]), sum)
Group.1 val1 val2
1 bill 4 5
2 fred 7 8
3 george 2 3
Thanks to @akrun's suggestion, we can use aggregate
and its argument subset
to avoid double subsetting:
aggregate(. ~ names, tot, FUN = sum, subset= c(ch == 'yes'))
# or
aggregate(.~names, subset(tot, ch == "yes"), sum)
names val1 val2 ch
1 bill 4 5 2
2 fred 7 8 2
3 george 2 3 2
Upvotes: 2
Reputation: 9656
This should be quite fast:
inds <- tot$ch=="yes"
rowsum(tot[inds, c("val1", "val2")], tot$names[inds])
val1 val2
bill 4 5
fred 7 8
george 2 3
Upvotes: 2
Reputation: 887851
We can either do a group by 'names' and the do the ==
within summarise_at
to get the sum
of 'val' columns that corresponds to 'ch' as 'yes'
library(dplyr)
tot %>%
group_by(names) %>%
summarise_at(vars(starts_with('val')), ~ sum(.[ch == 'yes']))
Or filter
the 'ch' first, but this could result in removing some 'names' that doesn't have the 'yes', so a complete
at the end would be better
library(tidyr)
tot %>%
filter(ch == 'yes') %>%
group_by(names) %>%
summarise_at(vars(starts_with('val')), sum) %>%
complete(names = unique(tot$names))
Upvotes: 2