Reputation: 2317
I have a data frame in R that has the following variables: Species name, year, and count data for each year. I performed a simple linear regression as follows and organized the output coefficients in a data frame as follows:
linmodel = data[,
list(intercept=coef(lm(x~year))[1], coef=coef(lm(x~year))[2]),
by=English_Common_Name]
This method only produces the intercept and slope of the regression. Is there a way to obtain the p values and R squared values and place them as columns in the output data frame?
Here is what a sample of the data looks like:
AOU English_Common_Name year x
1 1320 Mallard 1995 444
2 1320 Mallard 1996 550
3 1320 Mallard 1997 335
4 1320 Mallard 1998 351
5 1320 Mallard 1999 266
6 1320 Mallard 2000 597
7 1320 Mallard 2001 620
8 1320 Mallard 2002 246
9 1320 Mallard 2003 635
10 1320 Mallard 2004 301
11 1320 Mallard 2005 211
12 1320 Mallard 2006 191
13 1320 Mallard 2007 223
14 1320 Mallard 2008 210
15 1320 Mallard 2009 219
16 1320 Mallard 2010 166
17 1320 Mallard 2011 115
18 1320 Mallard 2012 92
19 1320 Mallard 2013 47
20 1320 Mallard 2014 100
21 1350 Gadwall 1995 37
22 1350 Gadwall 1996 12
23 1350 Gadwall 1997 11
24 1350 Gadwall 1998 11
25 1350 Gadwall 1999 5
26 1350 Gadwall 2000 3
27 1350 Gadwall 2001 4
28 1350 Gadwall 2002 6
29 1350 Gadwall 2003 5
30 1350 Gadwall 2004 9
31 1350 Gadwall 2005 17
32 1350 Gadwall 2006 4
33 1350 Gadwall 2007 15
34 1350 Gadwall 2008 16
35 1350 Gadwall 2009 3
36 1350 Gadwall 2010 23
37 1350 Gadwall 2011 2
Upvotes: 0
Views: 923
Reputation: 4024
This can be done using dplyr do
library(dplyr)
library(magrittr)
result =
data %>%
group_by(English_Common_Name) %>%
do({
result = lm(x ~ year, .)
data_frame(r_squared =
result %>%
summary %>%
use_series(adj.r.squared),
p_value =
result %>%
anova %>%
use_series(`Pr(>F)`) %>%
extract2(1) ) %>%
bind_cols(
result %>%
coef %>%
as.list %>%
as_data_frame)})
Upvotes: 2