jvalenti
jvalenti

Reputation: 640

R - dplyr case_when with which.max issue

I have a very large dataset that I need help processing conditionally

Here is a snippet:

structure(list(month = c(202407L, 202408L, 202409L, 202410L, 
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02", 
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79", 
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O", 
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

What I would like to do is create a variable out that behaves like so:

Here is what I tried:

group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
                     cycle == "M"& estimate == "" ~ 1,
                     cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
                     .default = 0))

Desired Result--The column out should look like so: c(0, 0, 0, 0, 0, 0, 1)

Note: the dataframe is grouped and that number of rows between each cycle==M entry is not necessarily fixed.

Any help is much appreciated. I tried using case_when because I figured dplyr would be the easiest way to do this but other suggestions are more than welcome.

Thanks.

Upvotes: 0

Views: 60

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389205

Fill up the cycle == "M" values first and then use tidyr::fill to get the most recent values copied at cycle == "O".

library(dplyr)

df %>%
  mutate(out = case_when(cycle == "M" & estimate == "" ~ 1, 
                         cycle == "M" & estimate != "" ~ 0)) %>%
  tidyr::fill(out)

# A tibble: 7 × 6
# Groups:   item [1]
#   month item  estimate cycle   rsp   out
#   <int> <chr> <chr>    <chr> <dbl> <dbl>
#1 202407 A02   "35.79"  M         1     0
#2 202408 A02   ""       O         1     0
#3 202409 A02   "35.79"  O         1     0
#4 202410 A02   "35.79"  M         1     0
#5 202411 A02   ""       O         0     0
#6 202412 A02   ""       O         0     0
#7 202501 A02   ""       M         0     1

Upvotes: 1

Related Questions