Reputation: 7282

Remove lines with only NAs from data.table

I want to remove the lines from a data.table that only contain NAs.

> tab = data.table(A = c(1, NA, 3), B = c(NA, NA, 3))
> tab
    A  B
1:  1 NA
2: NA NA
3:  3  3

Normally I would do it with apply(dat, 1, ...) which unfortunately does not work on a data.table but it leads me to this inelegant solution:

> tab[apply(as.data.frame(tab), 1, function(x) !all(is.na(x))), ]
   A  B
1: 1 NA
2: 3  3

How can this be achieved the fastest way without knowing the column names?

Upvotes: 2

Answers (3)

Reputation: 886978

We can use Reduce with is.na and &

tab[!Reduce(`&`, lapply(tab, is.na))]
#   A  B
#1: 1 NA
#2: 3  3

Or a compact but not so efficient approach would be

tab[rowSums(!is.na(tab)) != 0L]

Also, as commented by @Frank, a join based approach,

tab[!tab[NA_integer_], on = names(tab)]

Upvotes: 6

Reputation: 21621

Another idea:

library(dplyr)
df %>% 
  filter(rowSums(is.na(.)) < length(.))

Upvotes: 2

Reputation: 165

I quite like

tab <- tab[sapply(1:nrow(tab), function(i){!all(is.na(tab[i,]))}),]

It is intuitive to me, but I'm not sure it is the fastest approach.

HTH

Upvotes: 0