Reputation: 101
That's my data frame
Colour = c("red", "blue", "red", "blue", "yellow", "green", "red", "blue", "green", "red", "yellow", "blue")
Volume = c(46,46,57,57,57,57,99,99,99,111,111,122)
Cases = c(7,2,4,2,3,5,1,2,3,2,4,1)
df = data.frame(Colour, Volume, Cases)
I want to sum up Cases if Colour is "red"
OR "blue"
but if Volume is identical.
Those colours which are not specified should be kept. If red and blue can't be summed
up because they differ in Volume
then they should also be kept
The reult should look like that:
Colour = c("red_or_blue","red_or_blue","yellow","green","red_or_blue","green","red","yellow","blue")
Volume = c(46,57,57,57,99,99,111,111,122)
Cases = c(9,6,3,5,3,3,2,4,1)
df_agg = data.frame(Colour, Volume, Cases)
I've figured out a way where I create a further column which assigns an "red_or_blue"
to the row with red or blue and an x for the remaining rows. I then used aggregate:
df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue","x")
df_agg = aggregate(df$Cases, list(df$Volume, df$test), sum)
It works but i found this a bit cumbersome. Is there a more handy way that would skip creating an extra column? In future I need to sum up cases for red/blue AND for Volume 57/99. Having the extra column appears to make it a bit more tricky.
Also, I didn't manage to get the original colour being taken over if it's not red nor blue. I tried it this way but it woudln't work:
df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue",df$Colour)
Cheers, Paul
Upvotes: 2
Views: 1786
Reputation: 21492
I think if you follow @mrdwab's approach, you can use sapply
on each "split volume" to do
df$Cases <- sum(df[(df$Colour =='blue' | df$Colour == 'red'),][,3])
to get the number of cases, and
df$Colour[(df$Colour =='blue' | df$Colour == 'red')] <- 'readandblue'
to change the colornames. I'm also willing to bet there's a 2-line solution using ddply
but I'm not an expert w/ that tool (yet).
Upvotes: 0
Reputation: 193507
Here's a way sticking in base R (but probably not the most efficient way....)
Split your data into groups by Volume
temp = split(df, df$Volume)
Create a quick function to change the values for "red" and "blue" only in groups where there is a "red" AND a "blue" present.
red.and.blue = function(x) {
if (sum(c("red", "blue") %in% x$Colour) > 1) {
x$Colour = gsub("red|blue", "red-and-blue", x$Colour)
} else {
x$Colour = as.character(x$Colour)
}
x
}
Use that function on your temp
object that you created in Step 1:
temp = lapply(temp, red.and.blue)
Use aggregate()
to perform the aggregation you need to do. Specify the names in the aggregate()
arguments so that you maintain your original column names.
temp = lapply(temp, function(x) aggregate(list(Cases = x$Cases),
list(Colour = x$Colour,
Volume = x$Volume), sum))
Put it back all into a data.frame()
. Don't forget to assign a name if you want to store it as is.
do.call(rbind, temp)
# Colour Volume Cases
# 46 red-and-blue 46 9
# 57.1 green 57 5
# 57.2 red-and-blue 57 6
# 57.3 yellow 57 3
# 99.1 green 99 3
# 99.2 red-and-blue 99 3
# 111.1 red 111 2
# 111.2 yellow 111 4
# 122 blue 122 1
Upvotes: 1