Reputation: 1972
I have the following kind of data:
library(tidyverse)
library(lubridate)
data <- tibble(a = c(1, 1, 2, 3, 3),
b = c('x', 'y', 'z', 'z', 'z'),
c = c('ps', 'ps', 'qs', 'rs', 'rs'),
d = c(100, 200, 300, 400, 500),
strt = ymd(c('2019-03-20', '2020-01-01', '2018-01-02', '2020-05-01', '2016-01-01')),
fnsh = ymd(c(NA, NA, NA, '2020-06-01', '2016-05-01')))
The operation has to apply to the data as grouped by a, b, c (i.e. data %>% group_by(a, b, c)
).
I want to add a column that shows whether or not a group has a start within the latest year. To have a start within the latest year, a group has to:
1) Have a row with strt within the latest year
2) Not have a row with strt before the latest year and fnsh as NA (no disqualifying overlap)
3) Not have a row with strt before the latest year and fnsh as equal to or later than the latest of all entries in strt (no disqualifying overlap)
I am thus trying to get:
tibble(a = c(1, 1, 2, 3, 3),
b = c('x', 'y', 'z', 'z', 'z'),
c = c('ps', 'ps', 'qs', 'rs', 'rs'),
d = c(100, 200, 300, 400, 500),
strt = ymd(c('2019-03-20', '2020-01-01', '2018-01-02', '2020-05-01', '2016-01-01')),
fnsh = ymd(c(NA, NA, NA, '2020-06-01', '2016-05-01')),
startLatestYear = c(0, 1, 0, 1, 1))
My current approach is:
test <- data %>%
group_by(a, b, c) %>%
mutate(startLatestYear = case_when(all(is.na(fnsh)) &
min(strt) > today(tzone = 'CET') - years(1) &
min(strt) <= today(tzone = 'CET') ~ 1,
strt > today(tzone = 'CET') - years(1) &
strt <= today(tzone = 'CET') &
nrow(filter(., strt < today(tzone = 'CET') - years(1) &
fnsh %in% NA)) == 0 &
nrow(filter(., strt < today(tzone = 'CET') - years(1))) > 0 &
strt > max(pull(filter(., strt < today(tzone = 'CET') - years(1)), fnsh)) ~ 1,
TRUE ~ 0))
The first if
in my use of case_when()
seems to work, but the second does not. I suspect that my use of .
is wrong. How can I get the desired output?
Upvotes: 3
Views: 206
Reputation: 57686
.
is a facility provided by the magrittr package, where it refers to the left-hand side of the %>%
operator. %>%
knows nothing about dplyr verbs, so when you use .
inside the mutate
, it simply expands to the object that was piped in. In the case of a grouped df, that means the entire df, not the grouped subsets.
The best solution I've found so far is to replace the mutate
with a group_modify
:
data %>%
group_by(a, b, c) %>%
group_modify(function(.x, .y)
{
.x %>% mutate(startLatestYear=case_when(...))
})
This works because now the pipeline inside group_modify
is executed separately for each group.
Upvotes: 1