Reputation: 843
I have a problem with finding an efficient code that matches rows with columns in a matrix, and generates a new vector with all the matches.
I have a very large matrix. As an example, lets say there is a matrix called "test" like this one:
row <- c("aaa" , "bbb" , "ccc")
column <- paste(rep(row, each = 3) , rep(c(1:3) , times = 3) , sep = "_")
test <- matrix(rexp(90), nrow = 3 , ncol = 9)
colnames(test) <- column
rownames(test) <- row
In my case, this gives matrix test
:
aaa_1 aaa_2 aaa_3 bbb_1 bbb_2 bbb_3 ccc_1 ccc_2 ccc_3
aaa 0.08695083 0.5360101 0.2151808 0.2623833 0.05368126 3.5040455 0.3966199 1.1790225 0.16178868
bbb 0.26399994 0.2753358 0.3457663 2.1826606 0.73636302 0.8346718 0.9535214 0.4230223 1.59374844
ccc 0.84269411 0.1526342 0.5687740 0.7493685 0.68945927 2.7006906 0.6448158 1.0599139 0.05998212
So I would like to create a new vector called test1
that matches rows with columns of the same string. In my case I could use:
test1 <- c(test[grep("aaa" , rownames(test)) , grep("aaa" , colnames(test))] ,
test[grep("bbb" , rownames(test)) , grep("bbb" , colnames(test))] ,
test[grep("ccc" , rownames(test)) , grep("ccc", colnames(test))])
names(test1) <- column
and this would give me:
aaa_1 aaa_2 aaa_3 bbb_1 bbb_2 bbb_3 ccc_1 ccc_2
0.08695083 0.53601009 0.21518077 2.18266063 0.73636302 0.83467182 0.64481582 1.05991387
ccc_3
0.05998212
But this code is terrible if I have a gigantic matrix and vector. Would there be a more efficient way of doing this?
Also, in this example there is a string of three characters, but in my matrix there are string with differing lengths. Thanks for the help!
Upvotes: 2
Views: 56
Reputation: 886938
We could add table
attribute to the matrix
, convert to long form with as.data.frame
, use subset
to subset the elements and convert to a named vector with setNames
with(subset(as.data.frame.table(test), sub("_\\d+", "", Var2) ==
Var1, select = 2:3), setNames(Freq, Var2))
-output
# aaa_1 aaa_2 aaa_3 bbb_1 bbb_2 bbb_3 ccc_1 ccc_2 ccc_3
#1.6563422 3.1174159 3.5855340 0.3218447 0.5638403 2.7593073 0.1595813 0.2933381 0.4131952
NOTE: Values are different as there was no set.seed
Upvotes: 0
Reputation: 39858
One option could be:
test[t(sapply(rownames(test), grepl, colnames(test)))]
Upvotes: 1