Reputation: 2732
I want to insert a new column into a data.frame, which value is TRUE when there is at least one missing value in the row and FALSE otherwise.
For that problem, apply
is a a perfect use case:
tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA))
tab$missing <- apply(tab, 1, function(x) any(is.na(x)))
However, I loaded the strict package, and got this error: apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.
I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr:
tab %>%
rowwise() %>%
mutate(missing = any(is.na(.), na.rm = TRUE))
Upvotes: 11
Views: 9468
Reputation: 309
You can use the complete.cases
function:
tab %>% mutate(missing = !complete.cases(.))
To remove rows with one or more NAs, use:
tab %>% filter(complete.cases(.))
Upvotes: 1
Reputation: 851
This works for the example data:
library(tidyverse)
tab <- data_frame(a = 1:10,
b = c(NA, letters[2:10]),
c = c(LETTERS[1:9], NA))
tab_1 <- tab %>% mutate(missing = ifelse(is.na(b), TRUE, ifelse(is.na(c), TRUE, FALSE)))
> tab_1
a b c missing
1 1 <NA> A TRUE
2 2 b B FALSE
3 3 c C FALSE
4 4 d D FALSE
5 5 e E FALSE
6 6 f F FALSE
7 7 g G FALSE
8 8 h H FALSE
9 9 i I FALSE
10 10 j <NA> TRUE
Upvotes: 1
Reputation: 43334
If you want to avoid coercing to a matrix, you can use purrr::pmap
, which iterates across the elements of a list in parallel and passes them to a function:
library(tidyverse)
tab <- data_frame(a = 1:10,
b = c(NA, letters[2:10]),
c = c(LETTERS[1:9], NA))
tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...)))))
#> # A tibble: 10 x 4
#> a b c missing
#> <int> <chr> <chr> <lgl>
#> 1 1 <NA> A TRUE
#> 2 2 b B FALSE
#> 3 3 c C FALSE
#> 4 4 d D FALSE
#> 5 5 e E FALSE
#> 6 6 f F FALSE
#> 7 7 g G FALSE
#> 8 8 h H FALSE
#> 9 9 i I FALSE
#> 10 10 j <NA> TRUE
In the function, c
is necessary to pull all the parameters passed to the function ...
into a vector, which can be passed to is.na
and collapsed with any
. The *_lgl
suffixed pmap
simplifies the result to a Boolean vector.
Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g.
tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0)
which returns the same thing.
Upvotes: 9