Kaleb
Kaleb

Reputation: 1022

Mapping content of one matrix onto structure of another matrix

I have two matrices sourced from the same dataset but with different amounts of data available for each. I want to create a dataset that is a replicate of x in terms of column names and row names but which contains the data values in y. If the data is not available then an NA would be used as the value for that coordinate.

Not all of the row names in x are present in y and vice versa. The same holds true for the column names.

For the example input data I've given below, the rownames in x corresponding to those in y are the rowname start and end at | (I want to retain everthing after the | for other mappings).

What is the most efficient way to do this?

DESIRED OUTPUT

z = structure(c(NA, 1, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, 
NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(11L, 5L), .Dimnames = list(
c("AACSL|729522", "AACS|65985", "AADACL2|344752", "AADACL3|126767", 
"AADACL4|343066", "AADAC|13", "AADAT|51166", "AAGAB|79719", 
"AAK1|22848", "AAK12|14", "AANAT|15"), c("S18", "S20", "S45", 
"S95", "S100")))

EXAMPLE INPUT

x = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0), .Dim = c(11L, 
5L), .Dimnames = list(c("AACSL|729522", "AACS|65985", "AADACL2|344752", 
"AADACL3|126767", "AADACL4|343066", "AADAC|13", "AADAT|51166", 
"AAGAB|79719", "AAK1|22848", "AAK12|14", "AANAT|15"), c("S18", 
"S20", "S45", "S95", "S100")))

y = structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0), .Dim = c(11L, 4L), .Dimnames = list(c("A1BG", 
"A1CF", "A2ML1", "A4GALT", "AACS", "AAK1", "AARD", "AARS2", "AASDHPPT", 
"AASS", "BAACS"), c("S18", "S10", "S45", "S95")))

Upvotes: 1

Views: 66

Answers (1)

AlexT
AlexT

Reputation: 144

I think there might be a slight problem with the example that you provided, i can not see how the z is coming from the x and y above.. see this code:

intersect(sapply(rownames(x), #I am just extracting the letter codes here
             function(i){
                     return(
                             strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
             }),rownames(y))

#[1] "AACS" "AAK1"

weird, right? I mean, there is only 2 codes in y compared to x. However, I think the code below does what you are planning (with the exception of this inconsistency):

library(data.table)
library(reshape2)
library(dplyr)
x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>%
        merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
        select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") -> xy
#now put back the column names where they belong
rownames(xy)<-xy$rownames
#now the only thing left is to arrange the columns
xy[rownames(x),colnames(x)] -> xy

Or am I wrong in understanding some of your points?

Upvotes: 1

Related Questions