Mark Miller
Mark Miller

Reputation: 13113

R: unique matrix rows by group

I would like to find unique combinations of missing observations in matrix rows by a group variable.

I can do so with the example data set by using the sequence of subset, cbind and rbind commands shown to generate the matrix u3.

However, I suspect there is a much better way that would not involve 'manually' subsetting the matrix for each level of the group variable. I have tried using the tapply command at the bottom, but cannot get it to work.

Thank you sincerely for any suggestions.

my.data  <- matrix(c( 

            1, 0, 1, 1, 1,
            NA, 1, 1, 0, 1,
            NA, 0, 0, 0, 1,
            NA, 1,NA, 1, 1,
            NA, 1, 1, 1, 1,
             0, 0, 1, 0, 1,
            NA, 0, 0, 0, 1,
             0,NA,NA,NA, 1,
             1,NA,NA,NA, 1,
             1, 1, 1, 1, 1,
            NA, 1, 1, 0, 1,

            1, 0, 1, 1, 2,
            1, 1, NA, 0, 2,
            NA, NA, NA, 0, 2,
            NA, NA,NA, 1, 2,
             1, 1, 1, NA, 2,
             0, 0, 1, 0, 2,
            NA, 0, 0, 0, 2,
             0,NA,NA,NA, 2,
             1,NA,NA,NA, 2,
             1, 1, 1, 1, 2,
             0, 1, 1, NA, 2

), 
nrow=22, byrow=T, 
dimnames = list(NULL, c("c1","c2","c3","c4","my.group")))

my.data <- as.data.frame(my.data)
my.data

g1 <- subset(my.data, my.data$my.group==1)
u1 <- unique( is.na(g1[1:4]) )
u1 <- cbind(1,u1)

g2 <- subset(my.data, my.data$my.group==2)
u2 <- unique( is.na(g2[1:4]) )
u2 <- cbind(2,u2)

u3 <- rbind(u1,u2)
u3


tapply(my.data[,1:4], my.data$my.group, function(x) {unique(is.na(x), 'rows') } )

Here is the matrix u3:

     c1 c2 c3 c4
1  1  0  0  0  0
2  1  1  0  0  0
4  1  1  0  1  0
8  1  0  1  1  1
12 2  0  0  0  0
13 2  0  0  1  0
14 2  1  1  1  0
16 2  0  0  0  1
18 2  1  0  0  0
19 2  0  1  1  1

Upvotes: 0

Views: 1051

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56915

You can use the plyr package for this, it's fantastic for "apply this function to each group"-type applications. In particular, the function ddply:

library(plyr)
u3 <- ddply(my.data,.(my.group),
      function(df)
          data.frame(unique(is.na(df[1:4])))
      )

Then u3 looks like this:

   my.group    c1    c2    c3    c4
1         1 FALSE FALSE FALSE FALSE
2         1  TRUE FALSE FALSE FALSE
3         1  TRUE FALSE  TRUE FALSE
4         1 FALSE  TRUE  TRUE  TRUE
5         2 FALSE FALSE FALSE FALSE
6         2 FALSE FALSE  TRUE FALSE
7         2  TRUE  TRUE  TRUE FALSE
8         2 FALSE FALSE FALSE  TRUE
9         2  TRUE FALSE FALSE FALSE
10        2 FALSE  TRUE  TRUE  TRUE

You could do as.matrix(u3) to get the numerical matrix.

Upvotes: 2

Related Questions