Reputation: 661
I have the following data frame:
structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA",
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61,
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001,
2001, 2002, 2002, 2002, 2004, 2000, 2005)), class = "data.frame", row.names = c(NA,
-11L))
I want to check if the previous state-year group in the dataset contains a prop lesser or equal than 0.6.
So, my desired output is:
structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA",
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61,
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001,
2001, 2002, 2002, 2002, 2004, 2000, 2005), competitive = c(NA,
NA, 1, 1, 1, 0, NA, NA, 0, NA, 1)), class = "data.frame", row.names = c(NA,
-11L))
As you can tell, since there is no data for CA-1999, the competitive column says NA. And, for CA-2001, since all the prop in CA-2000 are less than 0.6, they are all coded as competitive==1. And, for CA-2002, since CA-2001 contains one observation greater than 0.6, competitiveness==1. And same for PA. PA-2000 is the state-year preceding PA-2005 in the dataset, so the competitive value for PA-2005 is based on PA-2000.
I've tried many different methods and keep getting odd results, such as:
df <- df %>%
arrange(state, year) %>%
group_by(state) %>%
mutate(
prev = lag(prop <= 0.60),
competitive = ifelse(is.na(prev), NA, prev),
competitive = ifelse(any(prop >= 0.60), 0, 1)
)
This is only taking the previous entry, and doesn't consider the entire group.
Upvotes: 0
Views: 18
Reputation: 206253
Here's one method
input %>%
summarize(is_comp = all(prop<.6), .by=c(state, year)) %>%
mutate(is_prev_comp = lag(is_comp)+0, .by=c(state)) %>%
select(-is_comp) %>%
right_join(input)
which returns
# state year is_prev_comp prop
# 1 CA 2000 NA 0.40
# 2 CA 2000 NA 0.50
# 3 CA 2001 1 0.30
# 4 CA 2001 1 NA
# 5 CA 2001 1 0.61
# 6 CA 2002 0 0.30
# 7 WA 2002 NA 0.76
# 8 WA 2002 NA 0.40
# 9 WA 2004 0 0.35
# 10 PA 2000 NA 0.21
# 11 PA 2005 1 NA
Upvotes: 1