Remove rows from data.table in R based on values of several columns

Question

I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which id1 == 2 and id2 == 1.

If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns.

dat <- data.table(id1 = c(1,1,2,2,2,2),
                  id2 = c(1,2,1,2,3,1),
                  value = c(5,3,NA,6,7,3))

David Arenburg · Accepted Answer

If you want to check per combination of id1 and id2 if any of the values are NAs and then remove that whole combination, you can insert an if statement per group and only retrieve the results (using .SD) if that statement returns TRUE.

dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
#    id1 id2 value
# 1:   1   1     5
# 2:   1   2     3
# 3:   2   2     6
# 4:   2   3     7

Or similarly,

dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]

Remove rows from data.table in R based on values of several columns

Answers (1)

Related Questions