Reputation: 13
UPDATE : I've updated the example because it wasn't clear enough.
I am trying to aggregate in R columns of a dataframe based on a condition. My dataframe looks like this:
df <- data.frame(year = rep(2005, 8),
id = 1:8,
crash_x = c(0, 2, 0, 0, 4, 0,1,2),
crash_y = c(1, 0, 0, 0, 0, 1,0,0),
crash_z = c(0, 0, 3, 1, 0, 0,0,0),
injured_x = c(0, 1, 0, 0, 3, 0,0,0),
injured_y = c(0, 0, 2, 1, 0, 0,1,2),
injured_z = c(3, 0, 0, 0, 0, 2, 0,0))
year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1 0 1 0 0 0 3
2005 2 2 0 0 1 0 0
2005 3 0 0 3 0 2 0
2005 4 0 0 1 0 1 0
2005 5 4 0 0 3 0 0
2005 6 0 1 0 0 0 2
2005 7 1 0 0 0 1 0
2005 8 2 0 0 0 2 0
I would like to sum the columns on the condition that the columns crash_ and injured_ that share the same suffix (x, y, or z) have numbers greater than 0 in the same rows, e.g., rows 1 and 6, rows 3 and 4, rows 2 and 5, rows 7 and 8, etc.
The output should look like:
year crash_x crash_y crash_z injured_x injured_y injured_z
2005 0 2 0 0 0 5
2005 6 0 0 4 0 0
2005 0 0 4 0 3 0
2005 3 0 0 0 3 0
Is this possible ? Thanks!!
Upvotes: 1
Views: 81
Reputation: 1382
This solution first creates a new column with the "pattern" of 0 and non-0 values:
df <- data.frame(year = rep(2005, 8),
id = 1:8,
crash_x = c(0, 2, 0, 0, 4, 0,1,2),
crash_y = c(1, 0, 0, 0, 0, 1,0,0),
crash_z = c(0, 0, 3, 1, 0, 0,0,0),
injured_x = c(0, 1, 0, 0, 3, 0,0,0),
injured_y = c(0, 0, 2, 1, 0, 0,1,2),
injured_z = c(3, 0, 0, 0, 0, 2, 0,0))
df %<>% unite("pattern", c(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), remove = FALSE) %>%
mutate(pattern = gsub("[1-9]", "1", pattern))
Then summarizes each column according to pattern group with dplyr
:
df %>% group_by(pattern, year) %>%
summarise_at(vars(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), sum)
Upvotes: 1
Reputation: 5456
The easiest way is to reshape (base R variant):
library(reshape2)
d <- read.table(text = "year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1 0 1 0 0 0 3
2005 2 2 0 0 1 0 0
2005 3 0 0 3 0 2 0
2005 4 0 0 1 0 1 0
2005 5 4 0 0 3 0 0
2005 6 0 1 0 0 0 2", header = T, stringsAsFactors = F)
want <- melt(subset(d, select = -id), id.vars = "year", variable.name = "crash", value.name = "val")
want$postfix <- gsub("(^crash_)|(^injured_)", "", want$crash)
want <- aggregate(val ~ crash + year + postfix, want, sum)
dcast(want, year + postfix ~ crash, value.var = "val", fill = 0)
# year postfix crash_x crash_y crash_z injured_x injured_y injured_z
#1 2005 x 6 0 0 4 0 0
#2 2005 y 0 2 0 0 3 0
#3 2005 z 0 0 4 0 0 5
Upvotes: 0