Reputation: 978
I have data which is consist of information like this
dput(a)
structure(list(ENSEMBL = structure(c(1L, 2L, 3L, 3L, 3L, 4L), .Label = c("ENSG00000005187",
"ENSG00000006740", "ENSG00000008277", "ENSG00000013810"), class = "factor"),
log2FoldChange_Expression = c(-2.2756549273843, -1.76655532051033,
-1.58489726654531, -1.58489726654531, -1.58489726654531,
-2.04282868170093), log2FoldChange_Region = c(-2.11261476936419,
-2.37119008459253, -1.59565539803813, -2.4954310786834, -2.11050911441613,
-1.81996408306615), Peak_Region = structure(c(5L, 6L, 4L,
2L, 3L, 1L), .Label = c("Peak147010", "Peak194531", "Peak194535",
"Peak194536", "Peak75759", "Peak81940"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
The dataframe small subset
a
ENSEMBL log2FoldChange_Expression log2FoldChange_Region Peak_Region
1 ENSG00000005187 -2.275655 -2.112615 Peak75759
2 ENSG00000006740 -1.766555 -2.371190 Peak81940
3 ENSG00000008277 -1.584897 -1.595655 Peak194536
4 ENSG00000008277 -1.584897 -2.495431 Peak194531
5 ENSG00000008277 -1.584897 -2.110509 Peak194535
6 ENSG00000013810 -2.042829 -1.819964 Peak147010
My objective is to fit a regression model where I would like see
my log2FoldChange_Expression
my response variable
and log2FoldChange_Region
is my independent variable
Now the basic lm which i know how to run is this
res=lm(log2FoldChange_Expression ~ log2FoldChange_Region, data=Down_data)
My objective is to see which im not sure if it is logical or not!.
Peak_Region
and its respective ENSEMBL
I want to fit that model and see pvalue for each row. Is it possible to do the same?I want to have final output table where I would like to see pvalue for each row
ENSEMBL log2FoldChange_Expression log2FoldChange_Region Peak_Region pvalue
1 ENSG00000005187 -2.275655 -2.112615 Peak75759
2 ENSG00000006740 -1.766555 -2.371190 Peak81940
3 ENSG00000008277 -1.584897 -1.595655 Peak194536
4 ENSG00000008277 -1.584897 -2.495431 Peak194531
5 ENSG00000008277 -1.584897 -2.110509 Peak194535
6 ENSG00000013810 -2.042829 -1.819964 Peak147010
Upvotes: 0
Views: 153
Reputation: 1560
Look at my last comment.
Down_data <- structure(list(ENSEMBL = structure(c(1L, 2L, 3L, 3L, 3L, 4L),
.Label = c("ENSG00000005187","ENSG00000006740", "ENSG00000008277", "ENSG00000013810"),
class = "factor"),
log2FoldChange_Expression = c(-2.2756549273843, -1.76655532051033,-1.58489726654531, -1.58489726654531, -1.58489726654531,-2.04282868170093),
log2FoldChange_Region = c(-2.11261476936419,-2.37119008459253, -1.59565539803813, -2.4954310786834, -2.11050911441613,-1.81996408306615),
Peak_Region = structure(c(5L, 6L, 4L,2L, 3L, 1L),
.Label = c("Peak147010", "Peak194531", "Peak194535","Peak194536", "Peak75759", "Peak81940"),
class = "factor")),
class = "data.frame",row.names = c(NA,-6L))
res=lm(log2FoldChange_Expression ~ log2FoldChange_Region + ENSEMBL, data=Down_data)
summary(res)
Upvotes: 2