Reputation: 407
I want to transfer a named vector to matrix and fill up missing values (fill with 0s).
For example, I have a dataframe like this:
col1 col2 col3
Cancer1 Gene1 2.1
Cancer1 Gene2 2.51
Cancer1 Gene3 3.0
Cancer2 Gene1 0.9
Which has two columns of names: col1
and col2
. Then I want to transform this into a matrix, like:
Cancer1 Cancer2
Gene1 2.1 0.9
Gene2 2.51 0
Gene3 3.0 0
If there are missing values in the vector, fill with 0s.
How can I do this efficiently in R?
Upvotes: 2
Views: 5465
Reputation: 263301
Either xtabs
or tapply
should do it.
tapply(my.df$col3, rev(my.df[-3]), c)
col1
col2 cancer1 cancer2
gene1 2.1 2.2
gene2 2.5 NA
gene3 NA 3.0
tapply
would have the advantage that, if there were multiple instances of any one combination, you could return a function result like mean
applied to the group.
xtabs(col3 ~ col2 +col1, my.df) #same matrix result
Note that using tidyverse
methods like spread
are likely to give you data-objects of a "special" class (not matrices), which if you're not expecting them may have annoying properties, or if you are expecting them may seem wonderful.
Upvotes: 3
Reputation: 28826
You can use tidyr
package:
tidyr::spread(mydata, col1, col3, fill = 0)
# col2 Cancer1 Cancer2
# 1 Gene1 2.10 0.9
# 2 Gene2 2.51 0.0
# 3 Gene3 3.00 0.0
Data:
mydata <- structure(list(col1 = structure(c(1L, 1L, 1L, 2L), .Label = c("Cancer1",
"Cancer2"), class = "factor"), col2 = structure(c(1L, 2L, 3L,
1L), .Label = c("Gene1", "Gene2", "Gene3"), class = "factor"),
col3 = c(2.1, 2.51, 3, 0.9)), .Names = c("col1", "col2",
"col3"), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 4
Reputation: 1975
You can do a nested sapply, looping through each gene and cancer type. Use levels if you have factors or unique() if you have a character vector.
my.df <- data.frame(col1=c("cancer1", "cancer1", "cancer2", "cancer2"),
col2=c("gene1", "gene2", "gene3", "gene1"),
col3=c(2.1, 2.5, 3.0, 2.2))
my.mat <- sapply(levels(my.df$col1), (function(cancer){
sapply(levels(my.df$col2), (function(gene){
tmp <- my.df[my.df$col1 == cancer & my.df$col2 == gene, "col3"]
if (length(tmp) > 0) {
as.numeric(tmp[1])
} else {
NA
}
}))
}))
my.mat
Upvotes: 0