Tian
Tian

Reputation: 407

How to convert a dataframe to a matrix

I want to transfer a named vector to matrix and fill up missing values (fill with 0s).

For example, I have a dataframe like this:

col1     col2    col3
Cancer1  Gene1   2.1
Cancer1  Gene2   2.51
Cancer1  Gene3   3.0
Cancer2  Gene1   0.9

Which has two columns of names: col1 and col2. Then I want to transform this into a matrix, like:

        Cancer1   Cancer2
Gene1   2.1       0.9
Gene2   2.51      0
Gene3   3.0       0

If there are missing values in the vector, fill with 0s.

How can I do this efficiently in R?

Upvotes: 2

Views: 5465

Answers (3)

IRTFM
IRTFM

Reputation: 263301

Either xtabs or tapply should do it.

tapply(my.df$col3, rev(my.df[-3]), c)
       col1
col2    cancer1 cancer2
  gene1     2.1     2.2
  gene2     2.5      NA
  gene3      NA     3.0

tapply would have the advantage that, if there were multiple instances of any one combination, you could return a function result like mean applied to the group.

xtabs(col3 ~ col2 +col1, my.df)  #same matrix result

Note that using tidyverse methods like spread are likely to give you data-objects of a "special" class (not matrices), which if you're not expecting them may have annoying properties, or if you are expecting them may seem wonderful.

Upvotes: 3

M--
M--

Reputation: 28826

You can use tidyr package:

tidyr::spread(mydata, col1, col3, fill = 0)

#    col2 Cancer1 Cancer2 
# 1 Gene1    2.10     0.9 
# 2 Gene2    2.51     0.0 
# 3 Gene3    3.00     0.0

Data:

mydata <- structure(list(col1 = structure(c(1L, 1L, 1L, 2L), .Label = c("Cancer1", 
"Cancer2"), class = "factor"), col2 = structure(c(1L, 2L, 3L, 
1L), .Label = c("Gene1", "Gene2", "Gene3"), class = "factor"), 
col3 = c(2.1, 2.51, 3, 0.9)), .Names = c("col1", "col2", 
"col3"), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 4

Damiano Fantini
Damiano Fantini

Reputation: 1975

You can do a nested sapply, looping through each gene and cancer type. Use levels if you have factors or unique() if you have a character vector.

my.df <- data.frame(col1=c("cancer1", "cancer1", "cancer2", "cancer2"),
           col2=c("gene1", "gene2", "gene3", "gene1"), 
           col3=c(2.1, 2.5, 3.0, 2.2))

my.mat <- sapply(levels(my.df$col1), (function(cancer){
  sapply(levels(my.df$col2), (function(gene){
    tmp <- my.df[my.df$col1 == cancer & my.df$col2 == gene, "col3"]
    if (length(tmp) > 0) {
      as.numeric(tmp[1])
    } else {
      NA
    }
  }))
}))
my.mat

Upvotes: 0

Related Questions