Reputation: 1449
I have two datasets, each having the form:
Gene1Name, 234
Gene2Name, 445
Gene3Name, 23
...
GeneNName, 554
The gene names are identical for each of the 2 datasets. The numbers on the second column are the expression counts for the corresponding gene.
I want to perform a differential gene expression analysis on these datasets. For that, I am using a DESeq library.
To use the DESeq function one needs to create an object
dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)
For my case, what needs to be passed as arguments into
the DESeqDataSetFromMatrix
function?
Upvotes: 0
Views: 3058
Reputation: 4520
I think, if you'll try to follow this simple example, it might, at least, help you to solve your real problem.
We have to start from dummy data set preparation (please read how to make a minimal reproducible example):
treatment
data set:library(tidyverse)
set.seed(56154455)
treatment <- data.frame(
geneName = LETTERS,
cts = sample(0:1000, 26)
)
head(treatment)
# geneName cts
# 1 A 834
# 2 B 860
# 3 C 950
# 4 D 302
# 5 E 979
# 6 F 159
control
data set:set.seed(56154455)
control <- treatment[sample(1:26, 26), ]
control[, 1] <- treatment[, 1]
head(control)
# geneName cts
# 3 A 950
# 23 B 41
# 15 C 889
# 20 D 629
# 14 E 398
# 4 F 302
treatment
and control
by geneName
cts <- full_join(treatment, control, by = 'geneName') %>%
rename('treatment' = cts.x, 'control' = cts.y) %>%
column_to_rownames('geneName') %>%
as.matrix
head(cts)
# treatment control
# A 331 737
# B 914 676
# C 161 161
# D 592 769
# E 946 74
# F 813 314
coldata
tableRemember, this is just a dummy example, so your real coldata
, might include any number of columns, which reflects the design of your experiment. However, the number of rows in your coldata
, has to be equal to the number of columns in your experimental data (here it is cts
). Please read the documentation for SummarizedExperiment class, where you can find detailed explanation. Another great resource is the Rafa's book
coldata <- matrix(c("DMSO", "1xPBS"), dimnames = list(colnames(cts), 'treatment'))
coldata
# treatment
# treatment "DMSO"
# control "1xPBS"
DESeqDataSet
:dds <- DESeq2::DESeqDataSetFromMatrix(
countData = cts,
colData = coldata,
design = ~treatment
)
Where:
countData
is your experimental data, prepared as above;colData
is your coldata
matrix, with experimental metadata;~treatment
is the formula, describing the experimental model you test in your experiment. It could be anything like ~ treatment + sex * age
etc.☠
dds
# class: DESeqDataSet
# dim: 26 2
# metadata(1): version
# assays(1): counts
# rownames(26): A B ... Y Z
# rowData names(0):
# colnames(2): treatment control
# colData names(1): treatment
Upvotes: 2
Reputation: 9705
You just need to concatenate the two vectors and put it into a matrix.
Since you said your two datasets contain two column, I assume first is gene name, second is count. You also mentioned that the names are the same. So you can do this:
data <- cbind(x1[,2], x2[,2])
rownames(data) <- x1[,1]
colnames(data) <- c("sample1", "sample2")
meta <- data.frame(sampletype = c("A", "B"))
dds <- DESeqDataSetFromMatrix(countData=data, colData=meta, design=~sampletype)
Upvotes: 1