remove rows of dates and duplicate dates for any date that has an NA

Question

I'm looking to remove all dates that have any NAs/missing data for any observation type, from a data.frame that has duplicate dates. For example, here I'd like to end up with a data.frame of just three rows with 2018-12-05 data. My non-example data.frame is quite long, so I'm leaning toward run time over readability, if there are a few ways, and I'm open to tidyverse and non-tidyverse ways.

date_time <- rep(seq(from=as.POSIXct("2018-12-01", tz="GMT"), 
             to=as.POSIXct("2018-12-05", tz="GMT"), by="1 day"),3)
value     <- c(1,2,NA,NA,5,NA,NA,NA,4,5,7,NA,NA,NA,8)
class     <- c(rep("a", 5), rep("b", 5), rep("c", 5))
df        <- data.frame(date_time, value, class)

from:

filtered to:

camille · Accepted Answer

Within some dplyr functions, you can use the base all. Group by date, then find whether all values for each group are non-NA.

With some extra steps to illustrate:

library(dplyr)

df %>%
  arrange(date_time, class) %>%
  group_by(date_time) %>%
  mutate(all_vals = all(!is.na(value))) %>%
  filter(all_vals)
#> # A tibble: 3 x 4
#> # Groups:   date_time [1]
#>   date_time           value class all_vals
#>                      
#> 1 2018-12-05 00:00:00     5 a     TRUE    
#> 2 2018-12-05 00:00:00     5 b     TRUE    
#> 3 2018-12-05 00:00:00     8 c     TRUE

Or more briefly, find the non-NAs within filter:

df %>%
  group_by(date_time) %>%
  filter(all(!is.na(value)))
#> # A tibble: 3 x 3
#> # Groups:   date_time [1]
#>   date_time           value class
#>                  
#> 1 2018-12-05 00:00:00     5 a    
#> 2 2018-12-05 00:00:00     5 b    
#> 3 2018-12-05 00:00:00     8 c

^{Created on 2018-12-01 by the reprex package (v0.2.1)}

remove rows of dates and duplicate dates for any date that has an NA

Answers (2)

Related Questions