Yolikins15
Yolikins15

Reputation: 1

Applying a least squares model to multiple columns of data in R

I have an R data frame consisting of 308 observations of 93 variables (i.e., 308 rows of data organized into 93 columns). One column is the location, with 19 factors, and another column is the date of observation. The remaining 91 columns (3 through 93) contain concentration measurements for various chemicals.

I would like to apply a least squares regression (lm function) to each column of concentration measurements (the y variable) against the sampling date (the x variable), with the regressions grouped by location.

I've tried the following code, which works for one column of concentration measurements, giving the lm function results for each location in a list of 19 lists - but how do I adjust the code so that it runs over all 91 columns of measurements?

df.2 <- split(df, f=df$location)

model_lm <- lapply(df.2, function(x)
            lm(chemical ~ sample_date, data = x))

Upvotes: 0

Views: 266

Answers (2)

GuedesBF
GuedesBF

Reputation: 9858

Maybe we can use dplyr to create list columns with the desired linear models:

df %>% group_by(location) %>%
        summarise(across(-c(location, date), ~list(lm(data=tibble(.x, date), formula=.x ~ date)))

A reproducible example with the iris dataset:

output<-iris %>% group_by(Species) %>%
        summarise(across(1:3, ~list(lm(data=tibble(.x, Petal.Width), formula=.x ~ Petal.Width))))

output
# A tibble: 3 x 4
  Species    Sepal.Length Sepal.Width Petal.Length
  <fct>      <list>       <list>      <list>      
1 setosa     <lm>         <lm>        <lm>        
2 versicolor <lm>         <lm>        <lm>        
3 virginica  <lm>         <lm>        <lm>  

You can pull any model with base subsetting:

output$Sepal.Length[output$Species=='setosa']

[[1]]

Call:
lm(formula = Sepal.Length ~ Petal.Width, data = tibble(Sepal.Length, 
    Petal.Width))

Coefficients:
(Intercept)  Petal.Width  
     4.7772       0.9302  

Upvotes: 1

akrun
akrun

Reputation: 887118

We could pass the lhs as a matrix

lapply(df.2, function(x) lm(as.matrix(x[3:93]) ~ sample_date, data = x))

Using a small reproducible example

data(mtcars)
lm(as.matrix(mtcars[1:5])~ vs, mtcars)
Call:
lm(formula = as.matrix(mtcars[1:5]) ~ vs, data = mtcars)

Coefficients:
             mpg        cyl        disp       hp         drat     
(Intercept)    16.6167     7.4444   307.1500   189.7222     3.3922
vs              7.9405    -2.8730  -174.6929   -98.3651     0.4671

Upvotes: 2

Related Questions