Reputation: 23
I have a data frame containing thousands of rows but for simplicity let's consider it has 10 rows. Consider ten different proteins for several patients were measured, and the average is listed in the following dataframe.
proteins Year.1 Year.2 Year.4 Year.5
1 p1 1.90 2.30 2.40 2.80
2 p2 0.90 1.20 1.50 1.90
3 p3 2.30 5.20 6.20 8.70
4 p4 2.10 2.20 2.50 2.60
5 p5 1.85 1.92 1.99 2.01
6 p6 1.20 1.45 1.55 1.65
7 p7 3.50 3.60 3.80 4.10
8 p8 4.20 5.60 6.50 7.20
9 p9 3.80 3.90 4.10 4.50
10 p10 23.00 4.20 6.50 8.90
I need an r code to run the linear regression over each row (e.g row i=1: x=(1,2,3,4), y=(year.1[i,],year.2[i,],year.3[i,],year.4[i,])) and create several column where the intercept, slope, Rsquared can be recorded for them.
I am very new to R and have done some research but not sure how to write the formula for lm function
fold_model_lm<-function(df) {
lm((x<-c(1,2,3,4))~(y<-c(year.1,year.2,year.3,year.4)), data=df)
}
but it did not work. Any idea how to do this?
Upvotes: 2
Views: 1471
Reputation: 24722
broom::tidy
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
result = map(model, function(x) list(intercept= x$coef[1],
slope = x$coef[2],
rsq = summary(x)$r.squared))) %>%
unnest_wider(result)
Output:
id data model intercept slope rsq
<int> <list> <list> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> 1.65 0.28 0.956
2 2 <tibble [4 x 4]> <lm> 0.550 0.33 0.995
3 3 <tibble [4 x 4]> <lm> 0.550 2.02 0.971
4 4 <tibble [4 x 4]> <lm> 1.9 0.18 0.953
5 5 <tibble [4 x 4]> <lm> 1.81 0.0550 0.953
6 6 <tibble [4 x 4]> <lm> 1.1 0.145 0.940
7 7 <tibble [4 x 4]> <lm> 3.25 0.200 0.952
8 8 <tibble [4 x 4]> <lm> 3.40 0.99 0.975
9 9 <tibble [4 x 4]> <lm> 3.5 0.23 0.92
10 10 <tibble [4 x 4]> <lm> 20.7 -4 0.373
You can use tidyverse and broom
library(tidyverse)
library(broom)
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
Output:
id data model term estimate std.error statistic p.value
<int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> (Intercept) 1.65 0.116 14.2 0.00492
2 1 <tibble [4 x 4]> <lm> x 0.28 0.0424 6.60 0.0222
3 2 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.0474 11.6 0.00736
4 2 <tibble [4 x 4]> <lm> x 0.33 0.0173 19.1 0.00274
5 3 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.681 0.808 0.504
6 3 <tibble [4 x 4]> <lm> x 2.02 0.249 8.13 0.0148
7 4 <tibble [4 x 4]> <lm> (Intercept) 1.9 0.0775 24.5 0.00166
8 4 <tibble [4 x 4]> <lm> x 0.18 0.0283 6.36 0.0238
9 5 <tibble [4 x 4]> <lm> (Intercept) 1.81 0.0237 76.1 0.000173
10 5 <tibble [4 x 4]> <lm> x 0.0550 0.00866 6.35 0.0239
11 6 <tibble [4 x 4]> <lm> (Intercept) 1.1 0.0712 15.5 0.00416
12 6 <tibble [4 x 4]> <lm> x 0.145 0.0260 5.58 0.0306
13 7 <tibble [4 x 4]> <lm> (Intercept) 3.25 0.0866 37.5 0.000709
14 7 <tibble [4 x 4]> <lm> x 0.200 0.0316 6.32 0.0241
15 8 <tibble [4 x 4]> <lm> (Intercept) 3.40 0.309 11.0 0.00814
16 8 <tibble [4 x 4]> <lm> x 0.99 0.113 8.78 0.0127
17 9 <tibble [4 x 4]> <lm> (Intercept) 3.5 0.131 26.6 0.00141
18 9 <tibble [4 x 4]> <lm> x 0.23 0.0480 4.80 0.0408
19 10 <tibble [4 x 4]> <lm> (Intercept) 20.7 10.0 2.06 0.176
20 10 <tibble [4 x 4]> <lm> x -4 3.67 -1.09 0.389
Input:
structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6",
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85,
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92,
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99,
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01,
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
Upvotes: 3