Reputation: 189
Although my problem seems to be easy I struggle with it for quite a while now. I have a symmetrical matrix which contains P-values. Now I want to remove all rows / columns from the matrix which does NOT contain a value over a definite threshold.
Example matrix:
test <- c(1.0000000000, 0.001996328, 0.000176308, 0.0002305861, 0.1514324000,
0.0019963281, 1.000000000, 0.007106454, 0.409054300, 0.001210349,
0.0001763080, 0.007106454, 1.000000000, 0.217609400, 0.185434400,
0.0002305861, 0.409054269, 0.217609401, 1.000000, 1.972118e-09,
0.1514324468, 0.001210349, 0.185434396, 1.972118e-09, 1.000000)
m <- matrix(test, nrow=5, ncol=5)
genes <- c("geneA", "geneB", "geneC", "geneD", "geneE")
rownames(m) <- genes
colnames(m) <- genes
m
This will lead to this:
geneA geneB geneC geneD geneE
geneA 1.0000000000 0.001996328 0.000176308 2.305861e-04 1.514324e-01
geneB 0.0019963280 1.000000000 0.007106454 4.090543e-01 1.210349e-03
geneC 0.0001763080 0.007106454 1.000000000 2.176094e-01 1.854344e-01
geneD 0.0002305861 0.409054300 0.217609400 1.000000e+00 1.972118e-09
geneE 0.1514324000 0.001210349 0.185434400 1.972118e-09 1.000000e+00
Now I want to remove all rows / columns wich does NOT contain a value between -0.001 and 0.001. So in this sample matrix row and column "gene B" should be removed.
Some of the code snippets I tried looks like this:
test <- finalPScore[(abs(finalPScore[]) < 0.001)]
But then "test" is a numeric, containing the actual numbers.
test <- finalPScore[(abs(finalPScore[]) < 0.001),(abs(finalPScore[]) < 0.001)]
Error message: "(subscript) logical subscript too long" I also tried an approach via the names:
test <- (abs(finalPScore[]) < 0.001)
for(i in colnames(test)) { if(test[,i] == 1) { print( i ) } }
But then it only checks the first line for every column...
blub <- rownames(finalPScore[abs(finalPScore[]) < 0.001])
Doesn't work at all.
Anyhow, it seems to me that I'm missing something very easy but important. Could you help me with this?
PS: R is a must since I'm doing analyses before and will do plots afterwards. Also the original matrix is too big to export / write temporarily.
Upvotes: 2
Views: 1131
Reputation: 8425
I would do this as follows:
First, write a helper function that embodies the knowledge of the test: i'm calling this keepers.
keepers <- function(M, dim) apply(M, dim, function(X) abs(min(X)) < 0.001)
and apply it as follows:
> m[keepers(m, 1), keepers(m,2)]
geneA geneC geneD geneE
geneA 1.0000000000 0.000176308 2.305861e-04 1.514324e-01
geneC 0.0001763080 1.000000000 2.176094e-01 1.854344e-01
geneD 0.0002305861 0.217609400 1.000000e+00 1.972118e-09
geneE 0.1514324000 0.185434400 1.972118e-09 1.000000e+00
Note that i've written this function so that it can be applied to both columns and rows by setting the dim
value. You could make a more complex function that combines both and returns a matrix, but that would be more difficult to understand (which is a cost) and it's not necessary.
An advantage of using this type of function is that the solution works for higher order arrays (given your data i would imagine that 3d arrays are common).
Upvotes: 1
Reputation: 4614
Here is my take on that. This line finds columns (rows) we want to keep:
keep = (colSums(abs(m) < 0.001) > 0)
Checking the results:
genes[keep]
m[keep,keep]
Upvotes: 5
Reputation: 193497
There's probably a better way to do this, but here is one approach:
Y <- sort(unique(rownames(which(abs(m) < 0.001, arr.ind=TRUE))))
m[Y, Y]
# geneA geneC geneD geneE
# geneA 1.0000000000 0.000176308 2.305861e-04 1.514324e-01
# geneC 0.0001763080 1.000000000 2.176094e-01 1.854344e-01
# geneD 0.0002305861 0.217609400 1.000000e+00 1.972118e-09
# geneE 0.1514324000 0.185434400 1.972118e-09 1.000000e+00
Upvotes: 1