Reputation: 1022
I have two matrices sourced from the same dataset but with different amounts of data available for each.
I want to create a dataset that is a replicate of x
in terms of column names and row names but which contains the data values in y
. If the data is not available then an NA
would be used as the value for that coordinate.
Not all of the row names in x
are present in y
and vice versa. The same holds true for the column names.
For the example input data I've given below, the rownames in x
corresponding to those in y
are the rowname start and end at |
(I want to retain everthing after the |
for other mappings).
What is the most efficient way to do this?
DESIRED OUTPUT
z = structure(c(NA, 1, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA,
NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(11L, 5L), .Dimnames = list(
c("AACSL|729522", "AACS|65985", "AADACL2|344752", "AADACL3|126767",
"AADACL4|343066", "AADAC|13", "AADAT|51166", "AAGAB|79719",
"AAK1|22848", "AAK12|14", "AANAT|15"), c("S18", "S20", "S45",
"S95", "S100")))
EXAMPLE INPUT
x = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0), .Dim = c(11L,
5L), .Dimnames = list(c("AACSL|729522", "AACS|65985", "AADACL2|344752",
"AADACL3|126767", "AADACL4|343066", "AADAC|13", "AADAT|51166",
"AAGAB|79719", "AAK1|22848", "AAK12|14", "AANAT|15"), c("S18",
"S20", "S45", "S95", "S100")))
y = structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0), .Dim = c(11L, 4L), .Dimnames = list(c("A1BG",
"A1CF", "A2ML1", "A4GALT", "AACS", "AAK1", "AARD", "AARS2", "AASDHPPT",
"AASS", "BAACS"), c("S18", "S10", "S45", "S95")))
Upvotes: 1
Views: 66
Reputation: 144
I think there might be a slight problem with the example that you provided, i can not see how the z is coming from the x and y above.. see this code:
intersect(sapply(rownames(x), #I am just extracting the letter codes here
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
}),rownames(y))
#[1] "AACS" "AAK1"
weird, right? I mean, there is only 2 codes in y compared to x. However, I think the code below does what you are planning (with the exception of this inconsistency):
library(data.table)
library(reshape2)
library(dplyr)
x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>%
melt(id.vars=c("nms","rownames")) %>%
merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") -> xy
#now put back the column names where they belong
rownames(xy)<-xy$rownames
#now the only thing left is to arrange the columns
xy[rownames(x),colnames(x)] -> xy
Or am I wrong in understanding some of your points?
Upvotes: 1