Moses
Moses

Reputation: 1484

filtering any missing values in R

I have been trying to filter any missing values in R using dplyr but I don't seem to get it right. Any lead would be grateful.

library(dplyr)
library(tibble)

set.seed(2021)
testdata <- tibble(
  a1 = c(10, 12, NA, 10, 13),
  a2 = c("Test", "Test1", "Test 2", NA, NA),
  a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
  a4 = rbinom(5, 10, 0.5)
)

testdata2 <- testdata %>% 
  filter(across(.cols = everything(), ~any_vars(is.na(.))))

Upvotes: 5

Views: 3645

Answers (4)

jpenzer
jpenzer

Reputation: 919

To remove any rows that have an NA value you'll need to edit your code slightly, to include a negation (i.e. filter for the rows that return a FALSE when you ask if they contain missing values).

I also used .cols = contains("a") to show you a way of using tidy select when you don't want to include every column.

testdata %>%
  filter(across(.cols = contains("a"), ~!is.na(.)))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

If you are using dplyr to do this you can use the functions if_all/if_any to do this.

To select rows with at least one missing value -

library(dplyr)

testdata %>% filter(if_any(everything(), is.na))

#    a1 a2     a3        a4
#  <dbl> <chr>  <chr>  <int>
#1    10 Test   NA         5
#2    NA Test 2 Test 2     6
#3    10 NA     NA         5
#4    13 NA     Test 4     6

To select rows with no missing values -

testdata %>% filter(if_all(everything(), Negate(is.na)))

#     a1 a2    a3         a4
#  <dbl> <chr> <chr>   <int>
#1    12 Test1 Test 10     6

Upvotes: 8

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

library(tidyverse)
set.seed(2021)
testdata <- tibble(
  a1 = c(10, 12, NA, 10, 13),
  a2 = c("Test", "Test1", "Test 2", NA, NA),
  a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
  a4 = rbinom(5, 10, 0.5)
)

testdata %>% 
  filter(rowSums(across(.cols = everything(), .fns = is.na)) > 0)
#> # A tibble: 4 x 4
#>      a1 a2     a3        a4
#>   <dbl> <chr>  <chr>  <int>
#> 1    10 Test   <NA>       5
#> 2    NA Test 2 Test 2     6
#> 3    10 <NA>   <NA>       5
#> 4    13 <NA>   Test 4     6

Created on 2022-01-25 by the reprex package (v2.0.1)

or based on @Maël answer

anti_join(x = testdata, y = drop_na(testdata))
#> Joining, by = c("a1", "a2", "a3", "a4")
#> # A tibble: 4 x 4
#>      a1 a2     a3        a4
#>   <dbl> <chr>  <chr>  <int>
#> 1    10 Test   <NA>       5
#> 2    NA Test 2 Test 2     6
#> 3    10 <NA>   <NA>       5
#> 4    13 <NA>   Test 4     6

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 51894

For the sake of completeness, here's a base R concise solution:

testdata[apply(testdata, 1, \(x) any(is.na(x))),]

     a1 a2     a3        a4
1    10 Test   NA         5
2    NA Test 2 Test 2     6
3    10 NA     NA         5
4    13 NA     Test 4     6

There is also a function keep_na in the hacksaw package:

hacksaw::keep_na(testdata, .logic = "OR")

If you wish to select rows with no NAs, I'd suggest drop_na:

tidyr::drop_na(testdata)

     a1 a2    a3         a4
1    12 Test1 Test 10     6

Upvotes: 2

Related Questions