mercury0114
mercury0114

Reputation: 1449

How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

I have two datasets, each having the form:

Gene1Name, 234
Gene2Name, 445
Gene3Name, 23
...
GeneNName, 554

The gene names are identical for each of the 2 datasets. The numbers on the second column are the expression counts for the corresponding gene.

I want to perform a differential gene expression analysis on these datasets. For that, I am using a DESeq library.

To use the DESeq function one needs to create an object

dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)

For my case, what needs to be passed as arguments into the DESeqDataSetFromMatrix function?

Upvotes: 0

Views: 3058

Answers (2)

utubun
utubun

Reputation: 4520

I think, if you'll try to follow this simple example, it might, at least, help you to solve your real problem.

We have to start from dummy data set preparation (please read how to make a minimal reproducible example):

Make a treatment data set:

library(tidyverse)

set.seed(56154455)

treatment <- data.frame(
  geneName = LETTERS,
  cts      = sample(0:1000, 26)
)

head(treatment)

#   geneName cts
# 1        A 834
# 2        B 860
# 3        C 950
# 4        D 302
# 5        E 979
# 6        F 159

Make a control data set:

set.seed(56154455)

control   <- treatment[sample(1:26, 26), ]
control[, 1] <- treatment[, 1]

head(control)

#    geneName cts
# 3         A 950
# 23        B  41
# 15        C 889
# 20        D 629
# 14        E 398
# 4         F 302

Join both treatment and control by geneName

cts <- full_join(treatment, control, by = 'geneName') %>%
  rename('treatment' = cts.x, 'control' = cts.y) %>%
  column_to_rownames('geneName') %>%
  as.matrix

head(cts)

#   treatment control
# A       331     737
# B       914     676
# C       161     161
# D       592     769
# E       946      74
# F       813     314

Prepare your coldata table

Remember, this is just a dummy example, so your real coldata, might include any number of columns, which reflects the design of your experiment. However, the number of rows in your coldata, has to be equal to the number of columns in your experimental data (here it is cts). Please read the documentation for SummarizedExperiment class, where you can find detailed explanation. Another great resource is the Rafa's book

coldata <- matrix(c("DMSO", "1xPBS"), dimnames = list(colnames(cts), 'treatment'))

coldata

#        treatment
# treatment "DMSO"   
# control   "1xPBS" 

Finally, create your DESeqDataSet:

dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = cts, 
  colData   = coldata, 
  design    = ~treatment
  )

Where:

  • countData is your experimental data, prepared as above;
  • colData is your coldata matrix, with experimental metadata;
  • ~treatment is the formula, describing the experimental model you test in your experiment. It could be anything like ~ treatment + sex * age etc.

dds

# class: DESeqDataSet 
# dim: 26 2 
# metadata(1): version
# assays(1): counts
# rownames(26): A B ... Y Z
# rowData names(0):
# colnames(2): treatment control
# colData names(1): treatment

Upvotes: 2

thc
thc

Reputation: 9705

You just need to concatenate the two vectors and put it into a matrix.

Since you said your two datasets contain two column, I assume first is gene name, second is count. You also mentioned that the names are the same. So you can do this:

data <- cbind(x1[,2], x2[,2])
rownames(data) <- x1[,1]
colnames(data) <- c("sample1", "sample2")

meta <- data.frame(sampletype = c("A", "B"))

dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)

Upvotes: 1

Related Questions