user3249770
user3249770

Reputation: 375

Dplyr mutate column indicator between dates when some dates are missing

I am trying to add a binary column indicator when one columns (dt_3) date is between the dates in two other columns (dt_1 and dt_2). I have this working on a small sample of my data however on my larger set the date column (dt_3) I am looking to compare to the others has a lot of NA's. Which is throwing the error Error: Expecting a single value:. What is the best way to only check if non-NA values are between the two columns.

Here is an example of my data:

  dt_1        dt_2         dt_3
2019-7-10   2019-8-21    2020-2-01
2019-8-22   2019-10-11   2019-9-01
2019-2-09   2019-3-02       NA

My current code:

dates %>%
      mutate(between = ifelse(between(dt_3, dt_1, dt_2), 1, 0))

Expected Output:

      dt_1        dt_2         dt_3     between
    2019-7-10   2019-8-21    2020-2-01    0
    2019-8-22   2019-10-11   2019-9-01    1
    2019-2-09   2019-3-02       NA        0

Upvotes: 1

Views: 428

Answers (1)

akrun
akrun

Reputation: 887153

Instead of the the between an option is the comparison operators (>=, <=) and then replace the NA with 0

library(dplyr)
library(lubridate)
library(tidyr)
dates %>%
     mutate(across(everything(), ymd)) %>%
     mutate(between = mutate(between = replace_na(dt_3 >= dt_1 & dt_3 <= dt_2, 0))

With between, the left and right are not vectorized i.e it takes a single value. One option is rowwise

dates %>%
   mutate(across(everything(), ymd)) %>%
   rowwise %>%
   mutate(between  = replace_na(between(dt_3, dt_1, dt_2), 0))
# A tibble: 3 x 4
# Rowwise: 
#  dt_1       dt_2       dt_3       between
#  <date>     <date>     <date>       <dbl>
#1 2019-07-10 2019-08-21 2020-02-01       0
#2 2019-08-22 2019-10-11 2019-09-01       1
#3 2019-02-09 2019-03-02 NA               0

data

dates <- structure(list(dt_1 = c("2019-7-10", "2019-8-22", "2019-2-09"
), dt_2 = c("2019-8-21", "2019-10-11", "2019-3-02"), dt_3 = c("2020-2-01", 
"2019-9-01", NA)), class = "data.frame", row.names = c(NA, -3L
))

Upvotes: 1

Related Questions