How to perform linear regression over a dataframe for each row

Question

I have a data frame containing thousands of rows but for simplicity let's consider it has 10 rows. Consider ten different proteins for several patients were measured, and the average is listed in the following dataframe.

proteins Year.1 Year.2 Year.4 Year.5
1        p1   1.90   2.30   2.40   2.80
2        p2   0.90   1.20   1.50   1.90
3        p3   2.30   5.20   6.20   8.70
4        p4   2.10   2.20   2.50   2.60
5        p5   1.85   1.92   1.99   2.01
6        p6   1.20   1.45   1.55   1.65
7        p7   3.50   3.60   3.80   4.10
8        p8   4.20   5.60   6.50   7.20
9        p9   3.80   3.90   4.10   4.50
10      p10  23.00   4.20   6.50   8.90

I need an r code to run the linear regression over each row (e.g row i=1: x=(1,2,3,4), y=(year.1[i,],year.2[i,],year.3[i,],year.4[i,])) and create several column where the intercept, slope, Rsquared can be recorded for them.

I am very new to R and have done some research but not sure how to write the formula for lm function

fold_model_lm<-function(df) {
        lm((x<-c(1,2,3,4))~(y<-c(year.1,year.2,year.3,year.4)), data=df)
}

but it did not work. Any idea how to do this?

langtang · Accepted Answer

Updated to extract r-squared, and to forego the use of `broom::tidy`

dat %>% 
  mutate(id=row_number()) %>% 
  pivot_longer(starts_with("Year")) %>% 
  group_by(id) %>% 
  mutate(x=c(1,2,3,4)) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(value ~ x, data = .)),
         result = map(model, function(x) list(intercept= x$coef[1],
                                              slope = x$coef[2],
                                              rsq = summary(x)$r.squared))) %>% 
  unnest_wider(result)

Output:

      id data             model  intercept   slope   rsq
                        
 1     1         1.65   0.28   0.956
 2     2         0.550  0.33   0.995
 3     3         0.550  2.02   0.971
 4     4         1.9    0.18   0.953
 5     5         1.81   0.0550 0.953
 6     6         1.1    0.145  0.940
 7     7         3.25   0.200  0.952
 8     8         3.40   0.99   0.975
 9     9         3.5    0.23   0.92 
10    10        20.7   -4      0.373

Prior Answer

You can use tidyverse and broom

library(tidyverse)
library(broom)
dat %>% 
  mutate(id=row_number()) %>% 
  pivot_longer(starts_with("Year")) %>% 
  group_by(id) %>% 
  mutate(x=c(1,2,3,4)) %>% 
  nest() %>% 
  mutate(model = map(data, ~ lm(value ~ x, data = .)),
         tidied = map(model, tidy)) %>%
  unnest(tidied)

Output:

      id data             model  term        estimate std.error statistic  p.value
                                        
 1     1     (Intercept)   1.65     0.116      14.2   0.00492 
 2     1     x             0.28     0.0424      6.60  0.0222  
 3     2     (Intercept)   0.550    0.0474     11.6   0.00736 
 4     2     x             0.33     0.0173     19.1   0.00274 
 5     3     (Intercept)   0.550    0.681       0.808 0.504   
 6     3     x             2.02     0.249       8.13  0.0148  
 7     4     (Intercept)   1.9      0.0775     24.5   0.00166 
 8     4     x             0.18     0.0283      6.36  0.0238  
 9     5     (Intercept)   1.81     0.0237     76.1   0.000173
10     5     x             0.0550   0.00866     6.35  0.0239  
11     6     (Intercept)   1.1      0.0712     15.5   0.00416 
12     6     x             0.145    0.0260      5.58  0.0306  
13     7     (Intercept)   3.25     0.0866     37.5   0.000709
14     7     x             0.200    0.0316      6.32  0.0241  
15     8     (Intercept)   3.40     0.309      11.0   0.00814 
16     8     x             0.99     0.113       8.78  0.0127  
17     9     (Intercept)   3.5      0.131      26.6   0.00141 
18     9     x             0.23     0.0480      4.80  0.0408  
19    10     (Intercept)  20.7     10.0         2.06  0.176   
20    10     x            -4        3.67       -1.09  0.389

Input:

structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6", 
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85, 
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92, 
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99, 
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01, 
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

How to perform linear regression over a dataframe for each row

Answers (1)

Updated to extract r-squared, and to forego the use of `broom::tidy`

Prior Answer

Related Questions

How to perform linear regression over a dataframe for each row

Answers (1)

Updated to extract r-squared, and to forego the use of broom::tidy

Prior Answer

Related Questions

Updated to extract r-squared, and to forego the use of `broom::tidy`