Reputation: 7282
I want to remove the lines from a data.table that only contain NAs.
> tab = data.table(A = c(1, NA, 3), B = c(NA, NA, 3))
> tab
A B
1: 1 NA
2: NA NA
3: 3 3
Normally I would do it with apply(dat, 1, ...)
which unfortunately does not work on a data.table but it leads me to this inelegant solution:
> tab[apply(as.data.frame(tab), 1, function(x) !all(is.na(x))), ]
A B
1: 1 NA
2: 3 3
How can this be achieved the fastest way without knowing the column names?
Upvotes: 2
Views: 362
Reputation: 886978
We can use Reduce
with is.na
and &
tab[!Reduce(`&`, lapply(tab, is.na))]
# A B
#1: 1 NA
#2: 3 3
Or a compact but not so efficient approach would be
tab[rowSums(!is.na(tab)) != 0L]
Also, as commented by @Frank, a join based approach,
tab[!tab[NA_integer_], on = names(tab)]
Upvotes: 6
Reputation: 21621
Another idea:
library(dplyr)
df %>%
filter(rowSums(is.na(.)) < length(.))
Upvotes: 2
Reputation: 165
I quite like
tab <- tab[sapply(1:nrow(tab), function(i){!all(is.na(tab[i,]))}),]
It is intuitive to me, but I'm not sure it is the fastest approach.
HTH
Upvotes: 0