Dnaiel
Dnaiel

Reputation: 7832

retrieve specific entries of a matrix based on values from a data frame

I have a data frame of the form:

my.df = data.frame(ID=c(1,2,3,4,5,6,7), STRAND=c('+','+','+','-','+','-','+'), COLLAPSE=c(0,0,1,0,1,0,0))

and another matrix of dimensions nrow(mydf) by nrow(my.df). It is a correlation matrix, but that's not important for the discussion.

For example:

mat = matrix(rnorm(n=nrow(my.df)*nrow(my.df),mean=1,sd=1), nrow = nrow(my.df), ncol=nrow(my.df))

The question is how to retrieve only the upper triangle elements from matrix mat, such that my.df have values of COLLAPSE == 0, and are of the of the same strand?

In this specific example, I'd interested in retrieving the following entries from matrix mat in a vector:

mat[1,2]
mat[1,7]
mat[2,7]
mat[4,6]

The logic is as follows, 1,2 are both of the same strand, and it's collapse value is equal to zero so should be retrieved, 3 would never be combined with any other row because it has collapse value = 1, 1,3 are of the same strand and have collapse value = 0 so should also be retrieved,...

I could write a for loop but I am looking for a more crantastic way to achieve such results...

Upvotes: 1

Views: 133

Answers (3)

Henrik
Henrik

Reputation: 67778

df <- my.df[my.df$COLLAPSE == 0, ]
strand <- c("+", "-")
idx <- do.call(rbind, lapply(strand, function(strand){
  t(combn(x = df$ID[df$STRAND == strand], m = 2))
}))
idx
#      [,1] [,2]
# [1,]    1    2
# [2,]    1    7
# [3,]    2    7
# [4,]    4    6

mat[idx]

Upvotes: 1

Jean V. Adams
Jean V. Adams

Reputation: 4784

Here's one way to do it.

# select only the 0 collapse records
sel <- my.df$COLLAPSE==0

# split the data frame by strand
groups <- split(my.df$ID[sel], my.df$STRAND[sel])

# generate all possible pairs of IDs within the same strand
pairs <- lapply(groups, combn, 2)

# subset the entries from the matrix
lapply(pairs, function(ij) mat[t(ij)])

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

Here's one way to do it using outer:

First, find indices with identical STRAND values and where COLLAPSE == 0:

idx <- with(my.df, outer(STRAND, STRAND, "==") &
              outer(COLLAPSE, COLLAPSE, Vectorize(function(x, y) !any(x, y))))

#       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
# [1,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
# [2,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
# [7,]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Second, set values in lower triangle and on the diagonal to FALSE. Create a numeric index:

idx2 <- which(idx & upper.tri(idx), arr.ind = TRUE)
#      row col
# [1,]   1   2
# [2,]   4   6
# [3,]   1   7
# [4,]   2   7

Extract values:

mat[idx2]
# [1] 1.72165093 0.05645659 0.74163428 3.83420241

Upvotes: 2

Related Questions