Conditional aggregating by column pairs in R

Question

UPDATE : I've updated the example because it wasn't clear enough.

I am trying to aggregate in R columns of a dataframe based on a condition. My dataframe looks like this:

df <- data.frame(year = rep(2005, 8),
             id = 1:8,
             crash_x = c(0, 2, 0, 0, 4, 0,1,2),
             crash_y = c(1, 0, 0, 0, 0, 1,0,0),
             crash_z = c(0, 0, 3, 1, 0, 0,0,0),
             injured_x = c(0, 1, 0, 0, 3, 0,0,0),
             injured_y = c(0, 0, 2, 1, 0, 0,1,2),
             injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

year id crash_x crash_y crash_z injured_x injured_y injured_z
2005 1    0       1       0         0        0          3
2005 2    2       0       0         1        0          0
2005 3    0       0       3         0        2          0
2005 4    0       0       1         0        1          0
2005 5    4       0       0         3        0          0
2005 6    0       1       0         0        0          2
2005 7    1       0       0         0        1          0
2005 8    2       0       0         0        2          0

I would like to sum the columns on the condition that the columns crash_ and injured_ that share the same suffix (x, y, or z) have numbers greater than 0 in the same rows, e.g., rows 1 and 6, rows 3 and 4, rows 2 and 5, rows 7 and 8, etc.

The output should look like:

year crash_x crash_y crash_z injured_x injured_y injured_z
2005     0       2       0         0        0          5
2005     6       0       0         4        0          0
2005     0       0       4         0        3          0
2005     3       0       0         0        3          0

Is this possible ? Thanks!!

broti · Accepted Answer

This solution first creates a new column with the "pattern" of 0 and non-0 values:

df <- data.frame(year = rep(2005, 8),
                 id = 1:8,
                 crash_x = c(0, 2, 0, 0, 4, 0,1,2),
                 crash_y = c(1, 0, 0, 0, 0, 1,0,0),
                 crash_z = c(0, 0, 3, 1, 0, 0,0,0),
                 injured_x = c(0, 1, 0, 0, 3, 0,0,0),
                 injured_y = c(0, 0, 2, 1, 0, 0,1,2),
                 injured_z = c(3, 0, 0, 0, 0, 2, 0,0))

df %<>% unite("pattern", c(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), remove = FALSE) %>%
  mutate(pattern = gsub("[1-9]", "1", pattern))

Then summarizes each column according to pattern group with dplyr:

df %>% group_by(pattern, year) %>% 
  summarise_at(vars(crash_x, crash_y, crash_z, injured_x, injured_y, injured_z), sum)

Conditional aggregating by column pairs in R

Answers (2)

Related Questions