Reputation: 1234
I'm looking to remove all dates that have any NAs/missing data for any observation type, from a data.frame
that has duplicate dates. For example, here I'd like to end up with a data.frame
of just three rows with 2018-12-05
data. My non-example data.frame
is quite long, so I'm leaning toward run time over readability, if there are a few ways, and I'm open to tidyverse
and non-tidyverse ways.
date_time <- rep(seq(from=as.POSIXct("2018-12-01", tz="GMT"),
to=as.POSIXct("2018-12-05", tz="GMT"), by="1 day"),3)
value <- c(1,2,NA,NA,5,NA,NA,NA,4,5,7,NA,NA,NA,8)
class <- c(rep("a", 5), rep("b", 5), rep("c", 5))
df <- data.frame(date_time, value, class)
from:
filtered to:
Upvotes: 0
Views: 226
Reputation: 76402
If you need speed, I don't know. But the following function keeps only the rows where the dates have all values of class
.
fun <- function(DF){
DF2 <- DF[!is.na(DF$value), ]
u <- unique(DF2$class)
sp <- split(DF2, DF2$date_time)
inx <- sapply(sp, function(d){
all(u %in% d$class)
})
DF2 <- do.call(rbind, sp[inx])
row.names(DF2) <- NULL
DF2
}
fun(df)
# date_time value class
#1 2018-12-05 5 a
#2 2018-12-05 5 b
#3 2018-12-05 8 c
Edit.
Here is a comparative speed test. Camille's answer is faster for larger dataframes, where speed is more important. And is nicer.
library(microbenchmark)
library(ggplot2)
library(dplyr)
fun2 <- function(DF){
DF %>%
arrange(date_time, class) %>%
group_by(date_time) %>%
mutate(all_vals = all(!is.na(value))) %>%
filter(all_vals)
}
mb <- microbenchmark(
rui = fun(df),
camille = fun2(df)
)
mb1 <- microbenchmark(
rui = fun(df1),
camille = fun2(df1)
)
ap <- autoplot(mb)
ap1 <- autoplot(mb1)
cowplot::plot_grid(ap, ap1)
Upvotes: 2
Reputation: 16832
Within some dplyr
functions, you can use the base all
. Group by date, then find whether all values for each group are non-NA
.
With some extra steps to illustrate:
library(dplyr)
df %>%
arrange(date_time, class) %>%
group_by(date_time) %>%
mutate(all_vals = all(!is.na(value))) %>%
filter(all_vals)
#> # A tibble: 3 x 4
#> # Groups: date_time [1]
#> date_time value class all_vals
#> <dttm> <dbl> <fct> <lgl>
#> 1 2018-12-05 00:00:00 5 a TRUE
#> 2 2018-12-05 00:00:00 5 b TRUE
#> 3 2018-12-05 00:00:00 8 c TRUE
Or more briefly, find the non-NA
s within filter
:
df %>%
group_by(date_time) %>%
filter(all(!is.na(value)))
#> # A tibble: 3 x 3
#> # Groups: date_time [1]
#> date_time value class
#> <dttm> <dbl> <fct>
#> 1 2018-12-05 00:00:00 5 a
#> 2 2018-12-05 00:00:00 5 b
#> 3 2018-12-05 00:00:00 8 c
Created on 2018-12-01 by the reprex package (v0.2.1)
Upvotes: 1