Reputation: 592
I have a situation where I need to train a regression model that will have 100 features. I want to look for interaction effects between all 100 features and one other feature. I would like to find a way to do this programatically as well since this analysis is going to be recuring and I don't want to have to reprogram a new formula each time this analysis is run. I want it to be automated. So how can I get a model that is like so
Y~a*b + a*c + .... a*z
But for 100 terms? How do I get the R formula to do this? Note I will be using statsmodels in python but I think the syntax is the same.
Upvotes: 0
Views: 79
Reputation: 3194
Here is an example of how to construct the wanted string and then convert to a formula
paste("a", letters[2:26], sep = "*") |>
paste(collapse = " + ") |>
sprintf(fmt = "Y ~ %s") |>
as.formula()
##> Y ~ a * b + a * c + a * d + a * e + a * f + a * g + a * h + a *
##> i + a * j + a * k + a * l + a * m + a * n + a * o + a * p +
##> a * q + a * r + a * s + a * t + a * u + a * v + a * w + a *
##> x + a * y + a * z
Upvotes: 1
Reputation: 79298
lm(Y ~ a * ., df)
eg
lm(Sepal.Width ~ Sepal.Length * ., iris)
Call:
lm(formula = Sepal.Width ~ Sepal.Length * ., data = iris)
Coefficients:
(Intercept) Sepal.Length Petal.Length Petal.Width
-0.91350 0.82954 0.29569 0.85334
Speciesversicolor Speciesvirginica Sepal.Length:Petal.Length Sepal.Length:Petal.Width
0.05894 -0.89244 -0.05394 -0.04654
Sepal.Length:Speciesversicolor Sepal.Length:Speciesvirginica
-0.32823 -0.21910
Upvotes: 3
Reputation: 592
Solution use regex:
# this would be the columns of a dataframe
effects_list = ['regressor_col','A', 'B', 'C', 'D', 'E','F']
interaction = effects_list[3]
regressor = effects_list[0]
formula = regressor + ' ~'
for effect in effects_list:
# check if it's the interaction term if it is skip it
#print((effect != interaction) & (effect != regressor))
if (effect != interaction) & (effect != regressor):
formula = formula + ' + ' + effect + '*' + interaction
print(formula)
Upvotes: 0