user1723765
user1723765

Reputation: 6409

Tricky multi-step subset selection

I have a matrix:

1 3  NA
1 2  0
1 7  2
1 5  NA
1 9 5
1 6  3
2 5  2
2 6  1
3 NA  4
4 2  9
...

I would like to select those elements for each number in the first column to which the corresponding value in the second column has an NA in its own second column.

So the search would go the following way:

  1. look up number in the first column: 1.
  2. check corresponding values in second column: 3,2,7,5,9,6...
  3. look up 3,2,7,5,9,6 in first column and see if they have NA in their second column

The result in the above case would be:

>3 NA  4<

Since this is the only value which has NA in its own second row.

Here's what I want to do in words:

  1. Look at the number in column one, I find '1'.

  2. What numbers does 1 have in its second column: 3,2,7,5,9,6

  3. Do these numbers have NA in their own second column? yes, 3 has an NA

  4. I would like it to return those numbers not row numbers.

  5. the result would be the subset of the original matrix with those rows which satisfy the condition.

This would be the matlab equivalent, where i is the number in column 1:

isnan(matrix(matrix(:,1)==i,2))==1) 

Upvotes: 0

Views: 110

Answers (3)

flodel
flodel

Reputation: 89097

This hopefully reads easily as it follows the steps you described:

idx1 <- m[, 1L] == 1L
idx2 <- m[, 1L] %in% m[idx1, 2L]
idx3 <- idx2 & is.na(m[, 2L])
m[idx3, ]
# V1 V2 V3 
#  3 NA  4

It is all vectorized and uses integer comparison so it should not be terribly slow. However, if it is too slow for your needs, you should use a data.table and use your first column as the key.

Note that you don't need any of the assignments, so if you are looking for a one-liner:

m[is.na(m[, 2L]) & m[, 1L] %in% m[m[, 1L] == 1L, 2L], ]
# [1]  3 NA  4

(but definitely harder to read and maintain.)

Upvotes: 2

agstudy
agstudy

Reputation: 121588

Using by, to get the result by group of column 1, assuming dat is your data frame

by(dat,dat$V1,FUN=function(x){
                  y <- dat[which(dat$V1 %in% x$V2),]
                  y[is.na(y$V2),]
})

dat$V1: 1
  V1 V2 V3
9  3 NA  4
-------------------------------------------------------------------------------- 
dat$V1: 2
[1] V1 V2 V3
<0 rows> (or 0-length row.names)
-------------------------------------------------------------------------------- 
dat$V1: 3
[1] V1 V2 V3
<0 rows> (or 0-length row.names)
-------------------------------------------------------------------------------- 
dat$V1: 4
[1] V1 V2 V3
<0 rows> (or 0-length row.names)

EDIT

Here I trie to do the same function as matlab command:

here the R equivalent of matlab

  isnan(matrix(matrix(:,1)==i,2))==1)   ## what is i here 

  is.na(dat[dat[dat[,1]==1,2],])        ## R equivalent , I set i =1

     V1    V2    V3
3 FALSE FALSE FALSE
2 FALSE FALSE FALSE
7 FALSE FALSE FALSE
5 FALSE FALSE FALSE
9 FALSE  TRUE FALSE
6 FALSE FALSE FALSE

Upvotes: 2

nograpes
nograpes

Reputation: 18323

I am still not totally clear as to what you want, but maybe this would work?

m<-read.table(
textConnection("1 3  NA
1 2  0
1 7  2
1 5  NA
1 9 5
1 6  3
2 5  2
2 6  1
3 NA  4
4 2  9"))

do.call(rbind,lapply(split(m[,2],m[,1]),function(x) m[x[!is.na(x)][is.na(m[x[!is.na(x)],2])],]))

#   V1 V2 V3
# 1  3 NA  4

It would be much nicer if you provided an example that you want to have more than one row.

Upvotes: 0

Related Questions