Reputation: 258
I am currently using a variable selection technique that requires me to determine if the coefficient for any given variable changed by more than 20% between models with different combinations of variables. I tried:
abs(model1$coefficients - model2$coefficients)/model1$coefficients
but the vectors are not the same length (because there are different variables in each model) so they are not lined up properly. Is there a way to compare coefficients with the same variable name across models? I could do this by hand, but there are 50+ coefficients and 10 models so it would take forever.
Sorry if this is obvious, but I have not been able to figure it out. I have looked around for answers to point me in the right direction, but all of them have to do with statistical comparisons of coefficients and do not include code that helps me solve this issue.
Upvotes: 3
Views: 2219
Reputation: 50678
You don't give any sample data so I am going to simulate data based on a model y = a + b * x1 + c * x2 + e
, where e ~ N(0, 1)
.
I then fit two models: y ~ x1
and y ~ x1 + x2
and use a custom function getEstimates
to extract parameters for the same predictor from both models. It's also a good idea to assess the importance of additional predictors using an ANOVA.
# Simulate some data
set.seed(2017);
generateData <- function(a = 1, b = 2, c = -2, nPoints = 1000) {
x1 <- runif(nPoints);
x2 <- runif(nPoints);
y <- a + b * x1 + c * x2 + rnorm(nPoints);
return(data.frame(y = y, x1 = x1, x2 = x2));
}
df <- generateData();
# Fit1: y ~ a + b * x1
fit1 <- lm(y ~ x1, data = df);
# Fit2: y ~ a + b * x1 + c * x2
fit2 <- lm(y ~ x1 + x2, data = df);
# ANOVA to explore importance of variable
anova(fit1, fit2);
#Analysis of Variance Table
#
#Model 1: y ~ x1
#Model 2: y ~ x1 + x2
# Res.Df RSS Df Sum of Sq F Pr(>F)
#1 998 1292.20
#2 997 994.46 1 297.74 298.5 < 2.2e-16 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Function to get estimates for parameter(s) par
# from two models fit1 and fit2
getEstimates <- function(par, fit1, fit2) {
lst <- lapply(par, function(x)
c(summary(fit1)$coef[x, 1], summary(fit2)$coef[x, 1]));
names(lst) <- par;
return(lst);
}
# Get coefficient for predictor x1
est <- getEstimates("x1", fit1, fit2);
Based on the output of getEstimates
you can then calculate the relative change of a parameter between two models.
# Calculate relative change in estimated x1 coefficient from both models
lapply(est, function(x) abs(x[1] - x[2])/x[1]);
#$x1
#[1] 0.0282493
Upvotes: 2