Martingales
Martingales

Reputation: 189

Filter a symmetric matrix depending on a threshold

Although my problem seems to be easy I struggle with it for quite a while now. I have a symmetrical matrix which contains P-values. Now I want to remove all rows / columns from the matrix which does NOT contain a value over a definite threshold.

Example matrix:

test <- c(1.0000000000, 0.001996328, 0.000176308, 0.0002305861, 0.1514324000,
0.0019963281, 1.000000000, 0.007106454, 0.409054300, 0.001210349,
0.0001763080, 0.007106454, 1.000000000, 0.217609400, 0.185434400,
0.0002305861, 0.409054269, 0.217609401, 1.000000, 1.972118e-09,
0.1514324468, 0.001210349, 0.185434396, 1.972118e-09, 1.000000)
m <- matrix(test, nrow=5, ncol=5)
genes <- c("geneA", "geneB", "geneC", "geneD", "geneE")
rownames(m) <- genes
colnames(m) <- genes
m

This will lead to this:

            geneA        geneB       geneC       geneD        geneE
      geneA 1.0000000000 0.001996328 0.000176308 2.305861e-04 1.514324e-01
      geneB 0.0019963280 1.000000000 0.007106454 4.090543e-01 1.210349e-03
      geneC 0.0001763080 0.007106454 1.000000000 2.176094e-01 1.854344e-01
      geneD 0.0002305861 0.409054300 0.217609400 1.000000e+00 1.972118e-09
      geneE 0.1514324000 0.001210349 0.185434400 1.972118e-09 1.000000e+00

Now I want to remove all rows / columns wich does NOT contain a value between -0.001 and 0.001. So in this sample matrix row and column "gene B" should be removed.

Some of the code snippets I tried looks like this:

test <- finalPScore[(abs(finalPScore[]) < 0.001)]

But then "test" is a numeric, containing the actual numbers.

test <- finalPScore[(abs(finalPScore[]) < 0.001),(abs(finalPScore[]) < 0.001)]

Error message: "(subscript) logical subscript too long" I also tried an approach via the names:

test <- (abs(finalPScore[]) < 0.001)
for(i in colnames(test)) { if(test[,i] == 1) { print( i ) } }

But then it only checks the first line for every column...

blub <- rownames(finalPScore[abs(finalPScore[]) < 0.001])

Doesn't work at all.

Anyhow, it seems to me that I'm missing something very easy but important. Could you help me with this?

PS: R is a must since I'm doing analyses before and will do plots afterwards. Also the original matrix is too big to export / write temporarily.

Upvotes: 2

Views: 1131

Answers (3)

ricardo
ricardo

Reputation: 8425

I would do this as follows:

First, write a helper function that embodies the knowledge of the test: i'm calling this keepers.

keepers <- function(M, dim) apply(M, dim, function(X) abs(min(X)) < 0.001)

and apply it as follows:

> m[keepers(m, 1), keepers(m,2)]
             geneA       geneC        geneD        geneE
geneA 1.0000000000 0.000176308 2.305861e-04 1.514324e-01
geneC 0.0001763080 1.000000000 2.176094e-01 1.854344e-01
geneD 0.0002305861 0.217609400 1.000000e+00 1.972118e-09
geneE 0.1514324000 0.185434400 1.972118e-09 1.000000e+00

Note that i've written this function so that it can be applied to both columns and rows by setting the dim value. You could make a more complex function that combines both and returns a matrix, but that would be more difficult to understand (which is a cost) and it's not necessary.

An advantage of using this type of function is that the solution works for higher order arrays (given your data i would imagine that 3d arrays are common).

Upvotes: 1

Andrey Shabalin
Andrey Shabalin

Reputation: 4614

Here is my take on that. This line finds columns (rows) we want to keep:

keep = (colSums(abs(m) < 0.001) > 0)

Checking the results:

genes[keep]
m[keep,keep]

Upvotes: 5

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

There's probably a better way to do this, but here is one approach:

Y <- sort(unique(rownames(which(abs(m) < 0.001, arr.ind=TRUE))))
m[Y, Y]
#              geneA       geneC        geneD        geneE
# geneA 1.0000000000 0.000176308 2.305861e-04 1.514324e-01
# geneC 0.0001763080 1.000000000 2.176094e-01 1.854344e-01
# geneD 0.0002305861 0.217609400 1.000000e+00 1.972118e-09
# geneE 0.1514324000 0.185434400 1.972118e-09 1.000000e+00

Upvotes: 1

Related Questions