How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

Question

I have two datasets, each having the form:

Gene1Name, 234
Gene2Name, 445
Gene3Name, 23
...
GeneNName, 554

The gene names are identical for each of the 2 datasets. The numbers on the second column are the expression counts for the corresponding gene.

I want to perform a differential gene expression analysis on these datasets. For that, I am using a DESeq library.

To use the DESeq function one needs to create an object

dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)

For my case, what needs to be passed as arguments into the DESeqDataSetFromMatrix function?

utubun · Accepted Answer

I think, if you'll try to follow this simple example, it might, at least, help you to solve your real problem.

We have to start from dummy data set preparation (please read how to make a minimal reproducible example):

Make a `treatment` data set:

library(tidyverse)

set.seed(56154455)

treatment <- data.frame(
  geneName = LETTERS,
  cts      = sample(0:1000, 26)
)

head(treatment)

#   geneName cts
# 1        A 834
# 2        B 860
# 3        C 950
# 4        D 302
# 5        E 979
# 6        F 159

Make a `control` data set:

set.seed(56154455)

control   <- treatment[sample(1:26, 26), ]
control[, 1] <- treatment[, 1]

head(control)

#    geneName cts
# 3         A 950
# 23        B  41
# 15        C 889
# 20        D 629
# 14        E 398
# 4         F 302

Join both `treatment` and `control` by `geneName`

cts <- full_join(treatment, control, by = 'geneName') %>%
  rename('treatment' = cts.x, 'control' = cts.y) %>%
  column_to_rownames('geneName') %>%
  as.matrix

head(cts)

#   treatment control
# A       331     737
# B       914     676
# C       161     161
# D       592     769
# E       946      74
# F       813     314

Prepare your `coldata` table

Remember, this is just a dummy example, so your real coldata, might include any number of columns, which reflects the design of your experiment. However, the number of rows in your coldata, has to be equal to the number of columns in your experimental data (here it is cts). Please read the documentation for SummarizedExperiment class, where you can find detailed explanation. Another great resource is the Rafa's book

coldata <- matrix(c("DMSO", "1xPBS"), dimnames = list(colnames(cts), 'treatment'))

coldata

#        treatment
# treatment "DMSO"   
# control   "1xPBS"

Finally, create your `DESeqDataSet`:

dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = cts, 
  colData   = coldata, 
  design    = ~treatment
  )

Where:

countData is your experimental data, prepared as above;
colData is your coldata matrix, with experimental metadata;
~treatment is the formula, describing the experimental model you test in your experiment. It could be anything like ~ treatment + sex * age etc.

☠

dds

# class: DESeqDataSet 
# dim: 26 2 
# metadata(1): version
# assays(1): counts
# rownames(26): A B ... Y Z
# rowData names(0):
# colnames(2): treatment control
# colData names(1): treatment

How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

Answers (2)

Make a `treatment` data set:

Make a `control` data set:

Join both `treatment` and `control` by `geneName`

Prepare your `coldata` table

Finally, create your `DESeqDataSet`:

Related Questions

How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

Answers (2)

Make a treatment data set:

Make a control data set:

Join both treatment and control by geneName

Prepare your coldata table

Finally, create your DESeqDataSet:

Related Questions

Make a `treatment` data set:

Make a `control` data set:

Join both `treatment` and `control` by `geneName`

Prepare your `coldata` table

Finally, create your `DESeqDataSet`: