Oliver Oliver
Oliver Oliver

Reputation: 2317

Get p value and R squared values for Simple linear model by group in R

I have a data frame in R that has the following variables: Species name, year, and count data for each year. I performed a simple linear regression as follows and organized the output coefficients in a data frame as follows:

 linmodel = data[,
                 list(intercept=coef(lm(x~year))[1], coef=coef(lm(x~year))[2]),
                 by=English_Common_Name]

This method only produces the intercept and slope of the regression. Is there a way to obtain the p values and R squared values and place them as columns in the output data frame?

Here is what a sample of the data looks like:

     AOU    English_Common_Name year    x
 1  1320    Mallard 1995    444
 2  1320    Mallard 1996    550
 3  1320    Mallard 1997    335
 4  1320    Mallard 1998    351
 5  1320    Mallard 1999    266
 6  1320    Mallard 2000    597
 7  1320    Mallard 2001    620
 8  1320    Mallard 2002    246
 9  1320    Mallard 2003    635
 10 1320    Mallard 2004    301
 11 1320    Mallard 2005    211
 12 1320    Mallard 2006    191
 13 1320    Mallard 2007    223
 14 1320    Mallard 2008    210
 15 1320    Mallard 2009    219
 16 1320    Mallard 2010    166
 17 1320    Mallard 2011    115
 18 1320    Mallard 2012    92
 19 1320    Mallard 2013    47
 20 1320    Mallard 2014    100
 21 1350    Gadwall 1995    37
 22 1350    Gadwall 1996    12
 23 1350    Gadwall 1997    11
 24 1350    Gadwall 1998    11
 25 1350    Gadwall 1999    5
 26 1350    Gadwall 2000    3
 27 1350    Gadwall 2001    4
 28 1350    Gadwall 2002    6
 29 1350    Gadwall 2003    5
 30 1350    Gadwall 2004    9
 31 1350    Gadwall 2005    17
 32 1350    Gadwall 2006    4
 33 1350    Gadwall 2007    15
 34 1350    Gadwall 2008    16
 35 1350    Gadwall 2009    3
 36 1350    Gadwall 2010    23
 37 1350    Gadwall 2011    2

Upvotes: 0

Views: 923

Answers (1)

bramtayl
bramtayl

Reputation: 4024

This can be done using dplyr do

library(dplyr) 
library(magrittr)

result = 
  data %>%
  group_by(English_Common_Name) %>%
  do({
    result = lm(x ~ year, .)
    data_frame(r_squared = 
                 result %>% 
                 summary %>% 
                 use_series(adj.r.squared),
               p_value = 
                 result %>% 
                 anova %>% 
                 use_series(`Pr(>F)`) %>% 
                 extract2(1) ) %>%
      bind_cols(
        result %>%
          coef %>%
          as.list %>%
          as_data_frame)})

Upvotes: 2

Related Questions