Reputation: 7832
I have a data frame of the form:
my.df = data.frame(ID=c(1,2,3,4,5,6,7), STRAND=c('+','+','+','-','+','-','+'), COLLAPSE=c(0,0,1,0,1,0,0))
and another matrix of dimensions nrow(mydf) by nrow(my.df). It is a correlation matrix, but that's not important for the discussion.
For example:
mat = matrix(rnorm(n=nrow(my.df)*nrow(my.df),mean=1,sd=1), nrow = nrow(my.df), ncol=nrow(my.df))
The question is how to retrieve only the upper triangle elements from matrix mat, such that my.df have values of COLLAPSE == 0, and are of the of the same strand?
In this specific example, I'd interested in retrieving the following entries from matrix mat in a vector:
mat[1,2]
mat[1,7]
mat[2,7]
mat[4,6]
The logic is as follows, 1,2 are both of the same strand, and it's collapse value is equal to zero so should be retrieved, 3 would never be combined with any other row because it has collapse value = 1, 1,3 are of the same strand and have collapse value = 0 so should also be retrieved,...
I could write a for loop but I am looking for a more crantastic way to achieve such results...
Upvotes: 1
Views: 133
Reputation: 67778
df <- my.df[my.df$COLLAPSE == 0, ]
strand <- c("+", "-")
idx <- do.call(rbind, lapply(strand, function(strand){
t(combn(x = df$ID[df$STRAND == strand], m = 2))
}))
idx
# [,1] [,2]
# [1,] 1 2
# [2,] 1 7
# [3,] 2 7
# [4,] 4 6
mat[idx]
Upvotes: 1
Reputation: 4784
Here's one way to do it.
# select only the 0 collapse records
sel <- my.df$COLLAPSE==0
# split the data frame by strand
groups <- split(my.df$ID[sel], my.df$STRAND[sel])
# generate all possible pairs of IDs within the same strand
pairs <- lapply(groups, combn, 2)
# subset the entries from the matrix
lapply(pairs, function(ij) mat[t(ij)])
Upvotes: 1
Reputation: 81693
Here's one way to do it using outer
:
First, find indices with identical STRAND
values and where COLLAPSE == 0
:
idx <- with(my.df, outer(STRAND, STRAND, "==") &
outer(COLLAPSE, COLLAPSE, Vectorize(function(x, y) !any(x, y))))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] FALSE TRUE FALSE FALSE FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [7,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Second, set values in lower triangle and on the diagonal to FALSE
. Create a numeric index:
idx2 <- which(idx & upper.tri(idx), arr.ind = TRUE)
# row col
# [1,] 1 2
# [2,] 4 6
# [3,] 1 7
# [4,] 2 7
Extract values:
mat[idx2]
# [1] 1.72165093 0.05645659 0.74163428 3.83420241
Upvotes: 2