Reputation: 3
EDIT: I've resorted to using the mtcars dataset instead, as it better represents the structure of my real data. Disregard the iris example below:
I'm looking to run several regression models with multiple dependent and independent variables.
Say I'm working on the iris dataset.
My dependent variables would be Petal.Length
and Petal.Width
.
My independent variables would be Sepal.Length
and Sepal.Width
.
I would like to run the following models for each species in the dataset separately (setosa
, versicolor
, virginica
):
lm(Petal.Length ~ Sepal.Length)
lm(Petal.Length ~ Sepal.Width)
lm(Petal.Width ~ Sepal.Length)
lm(Petal.Width ~ Sepal.Width)
My real dataset has many more dependent and independent variables, which is why I'd like to write code that is less repetitious.
Here is my attempt so far, where I've nested the Species
column with Tidyr
, and mutated a new column containing the regression data for each species:
library(dplyr)
library(tidyr)
library(purrr)
iris %>%
nest(data = !Species) %>%
mutate(
fit = map(data, ~ lm (cbind(Petal.Length, Petal.Width) ~ ., data = .x))
)
Which gives the equivalent of this for each species:
lm(Petal.Length ~ Sepal.Length + Sepal.Width)
lm(Petal.Width ~ Sepal.Length + Sepal.Width)
Let's use the mtcars dataset instead:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
...
My dependent variables would be disp
and hp
.
My independent variables would be drat
, wt
, and qsec
.
I would like to run the following models for each vs
and am
type in the dataset separately:
lm(disp ~ drat)
lm(disp ~ wt)
lm(disp ~ qsec)
lm(hp ~ drat)
lm(hp ~ wt)
lm(hp ~ qsec)
My real dataset has many more dependent and independent variables, which is why I'd like to write code that is less repetitious.
Here is my attempt so far, where I've nested the vs
and am
columns with Tidyr
, and mutated a new column containing the regression data for each vs
and am
:
library(dplyr)
library(tidyr)
library(purrr)
mtcars %>%
group_by(vs,am) %>%
nest() %>%
mutate(
fit = map(data, ~ lm (cbind(disp, hp) ~ drat + wt + qsec, data = .x))
)
# A tibble: 4 x 4
# Groups: vs, am [4]
vs am data fit
<dbl> <dbl> <list> <list>
1 0 1 <tibble [6 x 9]> <mlm>
2 1 1 <tibble [7 x 9]> <mlm>
3 1 0 <tibble [7 x 9]> <mlm>
4 0 0 <tibble [12 x 9]> <mlm>
Which gives the equivalent of the following formulas for each combination of vs
and am
:
lm(disp ~ drat + wt + qsec)
lm(hp ~ drat + wt + qsec)
But this isn't what I want. The above code gives two multiple regressions per vs
and am
combination, when I want six simple regressions. I have no idea how to produce this in R.
Upvotes: 0
Views: 205
Reputation: 79238
This is modelling, and using the correct formula after reshaping the data, you can get the desired results without the loops.
new_data <- pivot_longer(iris, c(Sepal.Length, Sepal.Width),
names_to = 'Sepal', names_prefix = 'Sepal')
lm(cbind(Petal.Width, Petal.Length) ~ value/(Species:Sepal) - value +
Sepal/Species - Sepal + 0, new_data)
Call:
lm(formula = cbind(Petal.Width, Petal.Length) ~ value/(Species:Sepal) -
value + Sepal/Species - Sepal + 0, data = new_data)
Coefficients:
Petal.Width Petal.Length
Speciessetosa:Sepal.Length -0.17022 0.80305
Speciesversicolor:Sepal.Length 0.08326 0.18512
Speciesvirginica:Sepal.Length 1.22611 0.61047
Speciessetosa:Sepal.Width 0.02418 1.18292
Speciesversicolor:Sepal.Width 0.16691 1.93492
Speciesvirginica:Sepal.Width 0.66406 3.51090
value:Speciessetosa:Sepal.Length 0.08314 0.13163
value:Speciesversicolor:Sepal.Length 0.20936 0.68647
value:Speciesvirginica:Sepal.Length 0.12142 0.75008
value:Speciessetosa:Sepal.Width 0.06471 0.08141
value:Speciesversicolor:Sepal.Width 0.41845 0.83938
value:Speciesvirginica:Sepal.Width 0.45795 0.68632
If you want to run the for loops:
iris %>%
nest(data = !Species) %>%
summarise(model = map(c('Sepal.Length', 'Sepal.Width'),
~map(data,~lm(reformulate(.y, 'cbind(Petal.Width, Petal.Length)'),.x) %>%
coef()%>%
as_tibble(rownames = 'rn'), .y = .x)%>%
set_names(Species)%>%
bind_rows(.id='Species')))%>%
unnest(model)
# A tibble: 12 x 4
Species rn Petal.Width Petal.Length
<chr> <chr> <dbl> <dbl>
1 setosa (Intercept) -0.170 0.803
2 setosa Sepal.Length 0.0831 0.132
3 versicolor (Intercept) 0.0833 0.185
4 versicolor Sepal.Length 0.209 0.686
5 virginica (Intercept) 1.23 0.610
6 virginica Sepal.Length 0.121 0.750
7 setosa (Intercept) 0.0242 1.18
8 setosa Sepal.Width 0.0647 0.0814
9 versicolor (Intercept) 0.167 1.93
10 versicolor Sepal.Width 0.418 0.839
11 virginica (Intercept) 0.664 3.51
12 virginica Sepal.Width 0.458 0.686
EDIT: in the case of mtcars dataset, the following will suffice:
pivot_longer(mtcars, c(drat, wt, qsec)) %>%
unite('am_vs', c(am, vs)) %>%
lm(cbind(disp, hp)~value/(am_vs:name)-value + name/am_vs - name + 0, .)
Running using the map
:
mtcars%>%
nest_by(vs, am)%>%
rowwise()%>%
summarise(vs, am, model = map(c('drat', 'wt', 'qsec'),
~lm(sprintf('cbind(disp, hp)~%s',.x), data)%>%
coef()%>%
as_tibble(rownames = 'rn')))%>%
unnest(model)
Upvotes: 0