Tina
Tina

Reputation: 13

Conditional aggregating by column pairs in R

UPDATE : I've updated the example because it wasn't clear enough.

I am trying to aggregate in R columns of a dataframe based on a condition. My dataframe looks like this:

df <- data.frame(year = rep(2005, 8),
             id = 1:8,
             crash_x = c(0, 2, 0, 0, 4, 0,1,2),
             crash_y = c(1, 0, 0, 0, 0, 1,0,0),
             crash_z = c(0, 0, 3, 1, 0, 0,0,0),
             injured_x = c(0, 1, 0, 0, 3, 0,0,0),
             injured_y = c(0, 0, 2, 1, 0, 0,1,2),
             injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1    0       1       0         0        0          3
2005 2    2       0       0         1        0          0
2005 3    0       0       3         0        2          0
2005 4    0       0       1         0        1          0
2005 5    4       0       0         3        0          0
2005 6    0       1       0         0        0          2
2005 7    1       0       0         0        1          0
2005 8    2       0       0         0        2          0

I would like to sum the columns on the condition that the columns crash_ and injured_ that share the same suffix (x, y, or z) have numbers greater than 0 in the same rows, e.g., rows 1 and 6, rows 3 and 4, rows 2 and 5, rows 7 and 8, etc.

The output should look like:

year crash_x crash_y crash_z injured_x injured_y injured_z
2005     0       2       0         0        0          5
2005     6       0       0         4        0          0
2005     0       0       4         0        3          0
2005     3       0       0         0        3          0

Is this possible ? Thanks!!

Upvotes: 1

Views: 81

Answers (2)

broti
broti

Reputation: 1382

This solution first creates a new column with the "pattern" of 0 and non-0 values:

df <- data.frame(year = rep(2005, 8),
                 id = 1:8,
                 crash_x = c(0, 2, 0, 0, 4, 0,1,2),
                 crash_y = c(1, 0, 0, 0, 0, 1,0,0),
                 crash_z = c(0, 0, 3, 1, 0, 0,0,0),
                 injured_x = c(0, 1, 0, 0, 3, 0,0,0),
                 injured_y = c(0, 0, 2, 1, 0, 0,1,2),
                 injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

df %<>% unite("pattern", c(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), remove = FALSE) %>%
  mutate(pattern = gsub("[1-9]", "1", pattern))

Then summarizes each column according to pattern group with dplyr:

df %>% group_by(pattern, year) %>% 
  summarise_at(vars(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), sum)

Upvotes: 1

r.user.05apr
r.user.05apr

Reputation: 5456

The easiest way is to reshape (base R variant):

library(reshape2)

d <- read.table(text = "year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1    0       1       0         0        0          3
2005 2    2       0       0         1        0          0
2005 3    0       0       3         0        2          0
2005 4    0       0       1         0        1          0
2005 5    4       0       0         3        0          0
2005 6    0       1       0         0        0          2", header = T, stringsAsFactors = F)

want <- melt(subset(d, select = -id), id.vars = "year", variable.name = "crash", value.name = "val")
want$postfix <- gsub("(^crash_)|(^injured_)", "", want$crash)
want <- aggregate(val ~ crash + year + postfix, want, sum)
dcast(want, year + postfix ~ crash, value.var = "val", fill = 0)

#  year postfix crash_x crash_y crash_z injured_x injured_y injured_z
#1 2005       x       6       0       0         4         0         0
#2 2005       y       0       2       0         0         3         0
#3 2005       z       0       0       4         0         0         5

Upvotes: 0

Related Questions