Reputation: 1484
I have been trying to filter any missing values in R using dplyr but I don't seem to get it right. Any lead would be grateful.
library(dplyr)
library(tibble)
set.seed(2021)
testdata <- tibble(
a1 = c(10, 12, NA, 10, 13),
a2 = c("Test", "Test1", "Test 2", NA, NA),
a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
a4 = rbinom(5, 10, 0.5)
)
testdata2 <- testdata %>%
filter(across(.cols = everything(), ~any_vars(is.na(.))))
Upvotes: 5
Views: 3645
Reputation: 919
To remove any rows that have an NA value you'll need to edit your code slightly, to include a negation (i.e. filter for the rows that return a FALSE when you ask if they contain missing values).
I also used .cols = contains("a")
to show you a way of using tidy select when you don't want to include every column.
testdata %>%
filter(across(.cols = contains("a"), ~!is.na(.)))
Upvotes: 1
Reputation: 388807
If you are using dplyr
to do this you can use the functions if_all
/if_any
to do this.
To select rows with at least one missing value -
library(dplyr)
testdata %>% filter(if_any(everything(), is.na))
# a1 a2 a3 a4
# <dbl> <chr> <chr> <int>
#1 10 Test NA 5
#2 NA Test 2 Test 2 6
#3 10 NA NA 5
#4 13 NA Test 4 6
To select rows with no missing values -
testdata %>% filter(if_all(everything(), Negate(is.na)))
# a1 a2 a3 a4
# <dbl> <chr> <chr> <int>
#1 12 Test1 Test 10 6
Upvotes: 8
Reputation: 8880
library(tidyverse)
set.seed(2021)
testdata <- tibble(
a1 = c(10, 12, NA, 10, 13),
a2 = c("Test", "Test1", "Test 2", NA, NA),
a3 = c(NA, "Test 10", "Test 2", NA, "Test 4"),
a4 = rbinom(5, 10, 0.5)
)
testdata %>%
filter(rowSums(across(.cols = everything(), .fns = is.na)) > 0)
#> # A tibble: 4 x 4
#> a1 a2 a3 a4
#> <dbl> <chr> <chr> <int>
#> 1 10 Test <NA> 5
#> 2 NA Test 2 Test 2 6
#> 3 10 <NA> <NA> 5
#> 4 13 <NA> Test 4 6
Created on 2022-01-25 by the reprex package (v2.0.1)
or based on @Maël answer
anti_join(x = testdata, y = drop_na(testdata))
#> Joining, by = c("a1", "a2", "a3", "a4")
#> # A tibble: 4 x 4
#> a1 a2 a3 a4
#> <dbl> <chr> <chr> <int>
#> 1 10 Test <NA> 5
#> 2 NA Test 2 Test 2 6
#> 3 10 <NA> <NA> 5
#> 4 13 <NA> Test 4 6
Upvotes: 2
Reputation: 51894
For the sake of completeness, here's a base R concise solution:
testdata[apply(testdata, 1, \(x) any(is.na(x))),]
a1 a2 a3 a4
1 10 Test NA 5
2 NA Test 2 Test 2 6
3 10 NA NA 5
4 13 NA Test 4 6
There is also a function keep_na
in the hacksaw
package:
hacksaw::keep_na(testdata, .logic = "OR")
If you wish to select rows with no NAs, I'd suggest drop_na
:
tidyr::drop_na(testdata)
a1 a2 a3 a4
1 12 Test1 Test 10 6
Upvotes: 2