Summarise variable across multi-period time-lag

Question

I am looking for a way to fit the following two dplyr operations into one and make my operation shorter and more elegant:

Overall, I want to create a sum across counter if conditions x and y are met and then allocate this sum to the 4 coming quarters (quarter), excluding the current quarter. There are cases where this leads to an overlap, which needs to be summed up. I cannot use dplyr's lag() because due to the summarise function, I do not have all consequential quarters in the output. This is why I had to do a "detour" and split the dplyr operation into two. I am now looking for an elegant way to do it all in one operation and to avoid the intermediary step.

#Rep example

compid <- c(replicate(10, "A"), replicate(10, "B"))
quarter <- c(11:20, 11:20)
x <- c(0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0)
counter <- c(0,1,2,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0)

dat <- data.frame(compid, quarter, x, counter)


#First, I create the sum count

dat %>%
  group_by(compid, quarter) %>%
  filter(x == 1) %>%
  summarise(sumcount = sum(counter)) %>%
  ungroup() -> temp

#Then, I did not know how to opearte in dplyr. I want to eliminate this intermediary step.

temp1 <- temp
temp1$quarter <- temp1$quarter + 1

temp2 <- temp
temp2$quarter <- temp2$quarter + 2

temp3 <- temp
temp3$quarter <- temp3$quarter + 3

temp4 <- temp
temp4$quarter <- temp4$quarter + 4

temp <- rbind(temp1, temp2, temp3, temp4)

#Lastly, I went back to dplyr to consolidate and refine the data

temp %>%
  group_by(compid, quarter) %>%
  summarise(sumcount = sum(sumcount)) %>%
  right_join(dat, by = c("compid", "quarter")) %>%
  mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))

Z.Lin · Accepted Answer

Here's a tidyverse approach to replace the intermediate steps:

dat %>%

  # this chunk is unchanged
  group_by(compid, quarter) %>%
  filter(x == 1) %>%
  summarise(sumcount = sum(counter)) %>%
  ungroup() %>%

  # this replaces the creation of temp datasets
  mutate(Q1 = quarter + 1,
         Q2 = quarter + 2,
         Q3 = quarter + 3,
         Q4 = quarter + 4) %>%
  select(-quarter) %>%
  tidyr::gather(key, quarter, -compid, -sumcount) %>%
  select(compid, quarter, sumcount) %>%

  # this chunk is unchanged
  group_by(compid, quarter) %>%
  summarise(sumcount = sum(sumcount)) %>%
  right_join(dat, by = c("compid", "quarter")) %>%
  mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))

# A tibble: 20 x 5
# Groups:   compid [2]
   compid quarter sumcount     x counter
              
 1      A      11        0     0       0
 2      A      12        0     1       1
 3      A      13        1     1       2
 4      A      14        3     0       0
 5      A      15        3     0       1
 6      A      16        3     0       0
 7      A      17        2     0       0
 8      A      18        0     0       0
 9      A      19        0     0       0
10      A      20        0     0       0
11      B      11        0     1       1
12      B      12        1     1       1
13      B      13        2     0       0
14      B      14        2     0       1
15      B      15        2     0       0
16      B      16        1     0       0
17      B      17        0     0       0
18      B      18        0     0       0
19      B      19        0     0       0
20      B      20        0     0       0

By the way, I didn't see any condition for y, just x, but I assume that's not critical to the problem being asked here.

Summarise variable across multi-period time-lag

Answers (1)

Related Questions