Reputation: 640
I have a very large dataset that I need help processing conditionally
Here is a snippet:
structure(list(month = c(202407L, 202408L, 202409L, 202410L,
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02",
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79",
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O",
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class =
c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))
What I would like to do is create a variable out
that behaves like so:
If cycle
is M and estimate
is ""
, then out
is 1
If cycle
is M and estimate
is not ""
, then out
is 0
If cycle
is O and estimate
is ""
, then, look back at most recent row where cycle
is M
If, in that month, estimate
is ""
, then out
= 1
If estimate
is not ""
, then out
= 0.
Here is what I tried:
group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
cycle == "M"& estimate == "" ~ 1,
cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
.default = 0))
Desired Result--The column out
should look like so: c(0, 0, 0, 0, 0, 0, 1)
Note: the dataframe is grouped and that number of rows between each cycle==M
entry is not necessarily fixed.
Any help is much appreciated. I tried using case_when
because I figured dplyr
would be the easiest way to do this but other suggestions are more than welcome.
Thanks.
Upvotes: 0
Views: 60
Reputation: 389205
Fill up the cycle == "M"
values first and then use tidyr::fill
to get the most recent values copied at cycle == "O"
.
library(dplyr)
df %>%
mutate(out = case_when(cycle == "M" & estimate == "" ~ 1,
cycle == "M" & estimate != "" ~ 0)) %>%
tidyr::fill(out)
# A tibble: 7 × 6
# Groups: item [1]
# month item estimate cycle rsp out
# <int> <chr> <chr> <chr> <dbl> <dbl>
#1 202407 A02 "35.79" M 1 0
#2 202408 A02 "" O 1 0
#3 202409 A02 "35.79" O 1 0
#4 202410 A02 "35.79" M 1 0
#5 202411 A02 "" O 0 0
#6 202412 A02 "" O 0 0
#7 202501 A02 "" M 0 1
Upvotes: 1