Kristen Cyr
Kristen Cyr

Reputation: 726

How to perform linear regression for multiple columns and get a dataframe output with: regression equation and r squared value?

My dataframe looks like this

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

and I'm trying to get the linear regression equations and R squared values for all columns where the column with the string "Stationary" in it will always be on the x-axis.

so far I can perform the linear regression for 1 column against the "stationary" column


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

and when I use

summary(model)

afterwards it gives me some values I would like in a dataframe (i.e R squared, Estimate Std., Std. Error, Pr(>|t|)) but 2 things I need to help with are:

  1. I still need the regression equation for each column that doesn't have stationary in the name
  2. I need these values for each of the columns that don't have stationary in it's name, and I need that to be a dataframe that looks like so...
 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA      

Upvotes: 0

Views: 1738

Answers (1)

TarJae
TarJae

Reputation: 79266

Something like this:

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

Using tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182 

Using glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6

Upvotes: 3

Related Questions