Reputation: 740
I am looking for a way to fit the following two dplyr operations into one and make my operation shorter and more elegant:
Overall, I want to create a sum across counter
if conditions x
and y
are met and then allocate this sum to the 4 coming quarters (quarter
), excluding the current quarter. There are cases where this leads to an overlap, which needs to be summed up. I cannot use dplyr's lag()
because due to the summarise function, I do not have all consequential quarters in the output. This is why I had to do a "detour" and split the dplyr operation into two. I am now looking for an elegant way to do it all in one operation and to avoid the intermediary step.
#Rep example
compid <- c(replicate(10, "A"), replicate(10, "B"))
quarter <- c(11:20, 11:20)
x <- c(0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0)
counter <- c(0,1,2,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0)
dat <- data.frame(compid, quarter, x, counter)
#First, I create the sum count
dat %>%
group_by(compid, quarter) %>%
filter(x == 1) %>%
summarise(sumcount = sum(counter)) %>%
ungroup() -> temp
#Then, I did not know how to opearte in dplyr. I want to eliminate this intermediary step.
temp1 <- temp
temp1$quarter <- temp1$quarter + 1
temp2 <- temp
temp2$quarter <- temp2$quarter + 2
temp3 <- temp
temp3$quarter <- temp3$quarter + 3
temp4 <- temp
temp4$quarter <- temp4$quarter + 4
temp <- rbind(temp1, temp2, temp3, temp4)
#Lastly, I went back to dplyr to consolidate and refine the data
temp %>%
group_by(compid, quarter) %>%
summarise(sumcount = sum(sumcount)) %>%
right_join(dat, by = c("compid", "quarter")) %>%
mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))
Upvotes: 2
Views: 105
Reputation: 29085
Here's a tidyverse approach to replace the intermediate steps:
dat %>%
# this chunk is unchanged
group_by(compid, quarter) %>%
filter(x == 1) %>%
summarise(sumcount = sum(counter)) %>%
ungroup() %>%
# this replaces the creation of temp datasets
mutate(Q1 = quarter + 1,
Q2 = quarter + 2,
Q3 = quarter + 3,
Q4 = quarter + 4) %>%
select(-quarter) %>%
tidyr::gather(key, quarter, -compid, -sumcount) %>%
select(compid, quarter, sumcount) %>%
# this chunk is unchanged
group_by(compid, quarter) %>%
summarise(sumcount = sum(sumcount)) %>%
right_join(dat, by = c("compid", "quarter")) %>%
mutate(sumcount = ifelse(is.na(sumcount), 0, sumcount))
# A tibble: 20 x 5
# Groups: compid [2]
compid quarter sumcount x counter
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 11 0 0 0
2 A 12 0 1 1
3 A 13 1 1 2
4 A 14 3 0 0
5 A 15 3 0 1
6 A 16 3 0 0
7 A 17 2 0 0
8 A 18 0 0 0
9 A 19 0 0 0
10 A 20 0 0 0
11 B 11 0 1 1
12 B 12 1 1 1
13 B 13 2 0 0
14 B 14 2 0 1
15 B 15 2 0 0
16 B 16 1 0 0
17 B 17 0 0 0
18 B 18 0 0 0
19 B 19 0 0 0
20 B 20 0 0 0
By the way, I didn't see any condition for y
, just x
, but I assume that's not critical to the problem being asked here.
Upvotes: 1