PesKchan
PesKchan

Reputation: 978

Fitting a regression model row-wise

I have data which is consist of information like this

dput(a)
structure(list(ENSEMBL = structure(c(1L, 2L, 3L, 3L, 3L, 4L), .Label = c("ENSG00000005187", 
"ENSG00000006740", "ENSG00000008277", "ENSG00000013810"), class = "factor"), 
    log2FoldChange_Expression = c(-2.2756549273843, -1.76655532051033, 
    -1.58489726654531, -1.58489726654531, -1.58489726654531, 
    -2.04282868170093), log2FoldChange_Region = c(-2.11261476936419, 
    -2.37119008459253, -1.59565539803813, -2.4954310786834, -2.11050911441613, 
    -1.81996408306615), Peak_Region = structure(c(5L, 6L, 4L, 
    2L, 3L, 1L), .Label = c("Peak147010", "Peak194531", "Peak194535", 
    "Peak194536", "Peak75759", "Peak81940"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

The dataframe small subset

a
          ENSEMBL log2FoldChange_Expression log2FoldChange_Region Peak_Region
1 ENSG00000005187                 -2.275655             -2.112615   Peak75759
2 ENSG00000006740                 -1.766555             -2.371190   Peak81940
3 ENSG00000008277                 -1.584897             -1.595655  Peak194536
4 ENSG00000008277                 -1.584897             -2.495431  Peak194531
5 ENSG00000008277                 -1.584897             -2.110509  Peak194535
6 ENSG00000013810                 -2.042829             -1.819964  Peak147010

My objective is to fit a regression model where I would like see

my log2FoldChange_Expression my response variable and log2FoldChange_Region is my independent variable

Now the basic lm which i know how to run is this

res=lm(log2FoldChange_Expression ~ log2FoldChange_Region, data=Down_data)

My objective is to see which im not sure if it is logical or not!.

  1. For Peak_Region and its respective ENSEMBLI want to fit that model and see pvalue for each row. Is it possible to do the same?

I want to have final output table where I would like to see pvalue for each row

ENSEMBL log2FoldChange_Expression log2FoldChange_Region Peak_Region           pvalue 
1 ENSG00000005187                 -2.275655             -2.112615   Peak75759
2 ENSG00000006740                 -1.766555             -2.371190   Peak81940
3 ENSG00000008277                 -1.584897             -1.595655  Peak194536
4 ENSG00000008277                 -1.584897             -2.495431  Peak194531
5 ENSG00000008277                 -1.584897             -2.110509  Peak194535
6 ENSG00000013810                 -2.042829             -1.819964  Peak147010

Upvotes: 0

Views: 153

Answers (1)

Bloxx
Bloxx

Reputation: 1560

Look at my last comment.

Down_data <- structure(list(ENSEMBL = structure(c(1L, 2L, 3L, 3L, 3L, 4L),
                                         .Label = c("ENSG00000005187","ENSG00000006740", "ENSG00000008277", "ENSG00000013810"),
                                         class = "factor"),
                     log2FoldChange_Expression = c(-2.2756549273843, -1.76655532051033,-1.58489726654531, -1.58489726654531, -1.58489726654531,-2.04282868170093),
                     log2FoldChange_Region = c(-2.11261476936419,-2.37119008459253, -1.59565539803813, -2.4954310786834, -2.11050911441613,-1.81996408306615),
                     Peak_Region = structure(c(5L, 6L, 4L,2L, 3L, 1L),
                                             .Label = c("Peak147010", "Peak194531", "Peak194535","Peak194536", "Peak75759", "Peak81940"),
                                             class = "factor")),
                class = "data.frame",row.names = c(NA,-6L))

res=lm(log2FoldChange_Expression ~ log2FoldChange_Region + ENSEMBL, data=Down_data)
summary(res)

Upvotes: 2

Related Questions