using R to select rows in data set with matching missing observations

Question

I have determined how to identify all unique patterns of missing observations in a data set. Now I would like to select all rows in that data set with a given pattern of missing observations. I would like to do this iteratively so that if there are n patterns of missing observations in the data set I end up with n data sets each containing only 1 pattern of missing observations.

I know how to do this, but my method is not very efficient and is not general. I am hoping to learn a more efficient and more general approach because my real data sets are much larger and more variable than in the example below.

Here is an example data set and the code I am using. I do not bother to include the code I used to create the matrix zzz from the matrix dd, but can add that code if it helps.

dd  <- matrix(c( 
            1, 0, 1, 1,
            NA, 1, 1, 0, 
            NA, 0, 0, 0, 
            NA, 1,NA, 1, 
            NA, 1, 1, 1, 
             0, 0, 1, 0, 
            NA, 0, 0, 0, 
             0,NA,NA,NA, 
             1,NA,NA,NA, 
             1, 1, 1, 1, 
            NA, 1, 1, 0), 
nrow=11, byrow=T) 

zzz <- matrix(c(
             1, 1, 1, 1,
            NA, 1, 1, 1, 
            NA, 1,NA, 1, 
             1,NA,NA,NA
), nrow=4, byrow=T) 

for(jj in 1:dim(zzz)[1]) { 
ddd <- 
dd[ 
((dd[, 1]%in%c(0,1) & zzz[jj, 1]%in%c(0,1)) | 
    (is.na(dd[, 1]) & is.na(zzz[jj, 1]))) & 
((dd[, 2]%in%c(0,1) & zzz[jj, 2]%in%c(0,1)) | 
(is.na(dd[, 2]) & is.na(zzz[jj, 2]))) & 
((dd[, 3]%in%c(0,1) & zzz[jj, 3]%in%c(0,1)) | 
(is.na(dd[, 3]) & is.na(zzz[jj, 3]))) & 
((dd[, 4]%in%c(0,1) & zzz[jj, 4]%in%c(0,1)) | 
(is.na(dd[, 4]) & is.na(zzz[jj, 4]))),] 

print(ddd) 
}

The 4 resulting data sets in this example are:

a) 
1    0    1    1 
0    0    1    0 
1    1    1    1 

b) 
NA    1    1    0 
NA    0    0    0 
NA    1    1    1 
NA    0    0    0 
NA    1    1    0 

c) 
NA  1 NA  1 

d) 
0   NA   NA   NA 
1   NA   NA   NA

Is there a more general and more efficient method of doing the same thing? In the example above the 4 resulting data sets are not saved, but I do save them with my real data.

Thank you for any advice.

Mark Miller

Vincent Zoonekynd · Accepted Answer

# Missing value patterns (TRUE=missing, FALSE=present)
patterns <- unique( is.na(dd) )

result <- list()
for( i in seq_len(nrow(patterns))) {
  # Rows with this pattern
  rows <- apply( dd, 1, function(u) all( is.na(u) == patterns[i,] ) )
  result <- append( result, list(dd[rows,]) )
}

using R to select rows in data set with matching missing observations

Answers (2)

Related Questions