Reputation: 71
I have a dataset like this:
library(tidyverse)
set.seed(123)
Data <- data.frame(
X = sample(c("A", "B", "C"), 20, replace = TRUE),
Y = sample(1:20)
)
Data%>%
arrange(X)
I would like to run series of regressions such that DV is Y but independent variables for each regression are factors taken two at a time. For instance, A&B, A&C,B&A,B&C Thanks for your help.
Upvotes: 0
Views: 134
Reputation: 136
For linear regression:
> combos <- as.data.frame(x = combn(x = c("A", "B", "C"), m = 2))
names(combos) <- sapply(
X = 1:3,
FUN = function(x){paste(combos[,x], collapse = "")}
)
> fit.list <- list()
> for (combo in names(combos)){
fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>%
lm(formula = .$Y ~ .$X, data = .)
}
> fit.list
$AB
Call:
lm(formula = .$Y ~ .$X, data = .)
Coefficients:
(Intercept) .$XB
10.4 3.6
$AC
Call:
lm(formula = .$Y ~ .$X, data = .)
Coefficients:
(Intercept) .$XC
10.4 -2.9
$BC
Call:
lm(formula = .$Y ~ .$X, data = .)
Coefficients:
(Intercept) .$XC
14.0 -6.5
EDIT For adding a covariate (e.g. Z), one way is to add as a new column in the data.frame, then add the column name to the model:
> set.seed(123)
> Data <- data.frame(
X = sample(c("A", "B", "C"), 20, replace = TRUE),
Y = sample(1:20),
Z = sample(1:20)
)
> fit.list <- list()
> for (combo in names(combos)){
fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>%
lm(formula = .$Y ~ .$X + .$Z, data = .)
}
> fit.list
$AB
Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)
Coefficients:
(Intercept) .$XB .$Z
3.6697 3.1921 0.5099
$AC
Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)
Coefficients:
(Intercept) .$XC .$Z
7.9306 -1.5063 0.1871
$BC
Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)
Coefficients:
(Intercept) .$XC .$Z
14.89888 -7.02970 -0.06421
Upvotes: 2