bison2178
bison2178

Reputation: 789

r filter dates with NAs

I am dealing with a dataset like this

Id   Col1   Date   FromDate   ToDate     Weight
1    Blue   2018   NA         NA         24.9
2    Green  2018   NA         NA         14.5
3    Red    2010   2009-05-15 2015-09-15 87.5
3    Red    2010   2015-10-15 2020-08-15 43.8
4    Yellow 2020   2019-10-15 2021-05-15 29.5
5    Purple 2011   NA         NA         NA

Based on the logic the expected dataset would be like this

Id   Col1   Date   FromDate   ToDate     Weight
1    Blue   2018   NA         NA         24.9
2    Green  2018   NA         NA         14.5
3    Red    2010   2009-05-15 2015-09-15 87.5

4    Yellow 2020   2019-10-15 2021-05-15 29.5
5    Purple 2011   NA         NA         NA

I know I can use the filter function but I am not sure how to make it work with the NA values not excluded. Any suggestion is much appreciated.

Upvotes: 0

Views: 752

Answers (2)

R.W.
R.W.

Reputation: 99

Here's another solution, which utilises the {lubridate} package.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
dates <- tibble::tribble(
           ~Id,    ~Col1, ~Date,    ~FromDate,      ~ToDate, ~Weight,
            1L,   "Blue", 2018L,           NA,           NA,    24.9,
            2L,  "Green", 2018L,           NA,           NA,    14.5,
            3L,    "Red", 2010L, "2009-05-15", "2015-09-15",    87.5,
            3L,    "Red", 2010L, "2015-10-15", "2020-08-15",    43.8,
            4L, "Yellow", 2020L, "2019-10-15", "2021-05-15",    29.5,
            5L, "Purple", 2011L,           NA,           NA,      NA
           )

dates %>% 
  mutate(FromDate = lubridate::ymd(FromDate),
         ToDate   = lubridate::ymd(ToDate),
         Date     = lubridate::ymd(Date, truncated = 2L),
         Interval = lubridate::interval(ymd(FromDate), ymd(ToDate))) %>% 
  filter( ((Date %within% Interval) == TRUE) | (is.na(FromDate) & is.na(ToDate))) %>% 
  select(-Interval)
#> # A tibble: 5 × 6
#>      Id Col1   Date       FromDate   ToDate     Weight
#>   <int> <chr>  <date>     <date>     <date>      <dbl>
#> 1     1 Blue   2018-01-01 NA         NA           24.9
#> 2     2 Green  2018-01-01 NA         NA           14.5
#> 3     3 Red    2010-01-01 2009-05-15 2015-09-15   87.5
#> 4     4 Yellow 2020-01-01 2019-10-15 2021-05-15   29.5
#> 5     5 Purple 2011-01-01 NA         NA           NA

Created on 2021-10-11 by the reprex package (v2.0.1)

Upvotes: 0

student
student

Reputation: 999

You should be able to do like this:

df <- tribble(
  ~Date, ~From, ~To,
  2018, NA, NA,
  2010, "2009-05-09", "2015-09-15"
)

df %>% filter(
  # Condition 1
  (is.na(From) & is.na(To)) | 
    # Condition 2
    ((Date > year(From))) & 
       (Date < year(To))))
  )

Upvotes: 1

Related Questions