hy9fesh
hy9fesh

Reputation: 661

How do I check a state-year group contains a proportion lesser or equal to 0.60?

I have the following data frame:

structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA", 
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61, 
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002, 2004, 2000, 2005)), class = "data.frame", row.names = c(NA, 
-11L))

I want to check if the previous state-year group in the dataset contains a prop lesser or equal than 0.6.

So, my desired output is:

structure(list(state = c("CA", "CA", "CA", "CA", "CA", "CA", 
"WA", "WA", "WA", "PA", "PA"), prop = c(0.4, 0.5, 0.3, NA, 0.61, 
0.3, 0.76, 0.4, 0.35, 0.21, NA), year = c(2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002, 2004, 2000, 2005), competitive = c(NA, 
NA, 1, 1, 1, 0, NA, NA, 0, NA, 1)), class = "data.frame", row.names = c(NA, 
-11L))

As you can tell, since there is no data for CA-1999, the competitive column says NA. And, for CA-2001, since all the prop in CA-2000 are less than 0.6, they are all coded as competitive==1. And, for CA-2002, since CA-2001 contains one observation greater than 0.6, competitiveness==1. And same for PA. PA-2000 is the state-year preceding PA-2005 in the dataset, so the competitive value for PA-2005 is based on PA-2000.

I've tried many different methods and keep getting odd results, such as:

df <- df %>%
  arrange(state, year) %>%
  group_by(state) %>%
  mutate(
    prev = lag(prop <= 0.60),
    competitive = ifelse(is.na(prev), NA, prev),
    competitive = ifelse(any(prop >= 0.60), 0, 1)
    )

This is only taking the previous entry, and doesn't consider the entire group.

Upvotes: 0

Views: 18

Answers (1)

MrFlick
MrFlick

Reputation: 206253

Here's one method


input %>% 
  summarize(is_comp = all(prop<.6), .by=c(state, year)) %>% 
  mutate(is_prev_comp = lag(is_comp)+0, .by=c(state)) %>% 
  select(-is_comp) %>% 
  right_join(input)

which returns

#    state year is_prev_comp prop
# 1     CA 2000           NA 0.40
# 2     CA 2000           NA 0.50
# 3     CA 2001            1 0.30
# 4     CA 2001            1   NA
# 5     CA 2001            1 0.61
# 6     CA 2002            0 0.30
# 7     WA 2002           NA 0.76
# 8     WA 2002           NA 0.40
# 9     WA 2004            0 0.35
# 10    PA 2000           NA 0.21
# 11    PA 2005            1   NA

Upvotes: 1

Related Questions