Adrian
Adrian

Reputation: 843

Matching rownames with a string contained within a column

I have a problem with finding an efficient code that matches rows with columns in a matrix, and generates a new vector with all the matches.

I have a very large matrix. As an example, lets say there is a matrix called "test" like this one:

row <- c("aaa" , "bbb" , "ccc")
column <- paste(rep(row, each = 3) , rep(c(1:3) , times = 3) , sep = "_")

test <- matrix(rexp(90), nrow = 3 , ncol = 9)

colnames(test) <- column
rownames(test) <- row

In my case, this gives matrix test:

         aaa_1     aaa_2     aaa_3     bbb_1      bbb_2     bbb_3     ccc_1     ccc_2      ccc_3
aaa 0.08695083 0.5360101 0.2151808 0.2623833 0.05368126 3.5040455 0.3966199 1.1790225 0.16178868
bbb 0.26399994 0.2753358 0.3457663 2.1826606 0.73636302 0.8346718 0.9535214 0.4230223 1.59374844
ccc 0.84269411 0.1526342 0.5687740 0.7493685 0.68945927 2.7006906 0.6448158 1.0599139 0.05998212

So I would like to create a new vector called test1 that matches rows with columns of the same string. In my case I could use:

test1 <- c(test[grep("aaa" , rownames(test)) , grep("aaa" , colnames(test))] , 
           test[grep("bbb" , rownames(test)) , grep("bbb" , colnames(test))] ,
           test[grep("ccc" , rownames(test)) , grep("ccc", colnames(test))])

names(test1) <- column

and this would give me:

     aaa_1      aaa_2      aaa_3      bbb_1      bbb_2      bbb_3      ccc_1      ccc_2 
0.08695083 0.53601009 0.21518077 2.18266063 0.73636302 0.83467182 0.64481582 1.05991387 
     ccc_3 
0.05998212 

But this code is terrible if I have a gigantic matrix and vector. Would there be a more efficient way of doing this?

Also, in this example there is a string of three characters, but in my matrix there are string with differing lengths. Thanks for the help!

Upvotes: 2

Views: 56

Answers (2)

akrun
akrun

Reputation: 886938

We could add table attribute to the matrix, convert to long form with as.data.frame, use subset to subset the elements and convert to a named vector with setNames

with(subset(as.data.frame.table(test), sub("_\\d+", "", Var2) ==
         Var1, select = 2:3), setNames(Freq, Var2))

-output

#   aaa_1     aaa_2     aaa_3     bbb_1     bbb_2     bbb_3     ccc_1     ccc_2     ccc_3 
#1.6563422 3.1174159 3.5855340 0.3218447 0.5638403 2.7593073 0.1595813 0.2933381 0.4131952 

NOTE: Values are different as there was no set.seed

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One option could be:

test[t(sapply(rownames(test), grepl, colnames(test)))]

Upvotes: 1

Related Questions