Reputation: 618
I want to remove rows from a data.frame where all cols are NA
. But I'd like to keep rows that have some values NA
.
I know how to do this with base R but I'm trying to figure out how to make it work with tidyverse
. I'm trying the across
operator.
library(tidyverse)
teste <- data.frame(a = c(1,NA,3, NA), b = c(NA, NA, 3, 4), c = c(1, NA, 3, 4))
teste
#> a b c
#> 1 1 NA 1
#> 2 NA NA NA
#> 3 3 3 3
#> 4 NA 4 4
# I whant to remove rows where all values are NA
# that is, remove only line 2
# here I can get the lines with all values NA
teste %>%
filter(across(a:c, is.na))
#> a b c
#> 1 NA NA NA
# If I negate the filter, it does not work
# the last line (NA, 4, 4) is missing
teste %>%
filter(!across(a:c, is.na))
#> a b c
#> 1 1 NA 1
#> 2 3 3 3
# This is what I'm expecting
# a b c
# 1 NA 1
# 3 3 3
# NA 4 4
# Using base I can do this with
teste[apply(teste, 1, function(x) sum(is.na(x))) < 3,]
#> a b c
#> 1 1 NA 1
#> 3 3 3 3
#> 4 NA 4 4
How can I do this using tidyverse
?
Created on 2020-08-18 by the reprex package (v0.3.0)
Upvotes: 5
Views: 1191
Reputation: 81
Using data.table, you can produce the same outcome.
teste2 <- teste[-which(is.na(teste$a)&is.na(teste$b)&is.na(teste$c)),]
Upvotes: 2
Reputation: 389235
Previously in dplyr
, you could use filter_all
(for all columns)/filter_at
(for specific columns) which had any_vars
:
library(dplyr)
teste %>% filter_all(any_vars(!is.na(.)))
However, across
does not have direct replacement of any_vars
so you can use this with Reduce
:
teste %>% filter(Reduce(`|`, across(.fns = Negate(is.na))))
# a b c
#1 1 NA 1
#2 3 3 3
#3 NA 4 4
Upvotes: 2
Reputation: 887851
We can use base R
teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4
Or using apply
and any
teste[apply(!is.na(teste), 1, any),]
which can be also used within filter
teste %>%
filter(rowSums(!is.na(.)) >0)
Or using c_across
from dplyr
, we can directly remove the rows with all
NA
library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4
NOTE: filter_all
is getting deprecated
Upvotes: 5