j_3265
j_3265

Reputation: 207

Comparison of regression models in terms of the importance of variables

I would like to compare models (multiple regression, LASSO, Ridge, GBM) in terms of the importance of variables. But I'm not sure if the procedure is correct, because the values ​​obtained are not on the same scale.

In multiple regression and GBM values ​​range from 0 - 100 using varImp from the caret package. The calculation of this statistic is distinct in each of the methods.

Linear Models: the absolute value of the t-statistic for each model parameter is used.

Boosted Trees: this method uses the same approach as a single tree, but sums the importance of each boosting iteration.

While for LASSO and Ridge the values ​​are from 0.00 - 0.99, calculated with the function:

varImp <- function (object, lambda = NULL, ...) {
  beta <- predict (object, s = lambda, type = "coef")
  if (is.list (beta)) {
    out <- do.call ("cbind", lapply (beta, function (x)
      x [, 1])))
    out <- as.data.frame (out)
  } else
    out <- data.frame (Overall = beta [, 1])
  out <- abs (out [rownames (out)! = "(Intercept)",, drop = FALSE])
  out
}

Which was obtained here: Caret package - glmnet variable importance

I was guided by other questions on the forum, but could not understand why there is the difference between the scales. How can I make these measurements comparable?

Upvotes: 0

Views: 343

Answers (1)

sconfluentus
sconfluentus

Reputation: 4993

If the goal is simply to compare them side-by-side, then what matters is creating a scale that they can all inhabit together, and sorting them.

You can accomplish this by creating a standardized scale, and coercing all of your VarImps to the new consistent scale, in this case 0 to 100.


importance_data <- c(-23,12, 32, 18, 45, 1, 77, 18, 22)

new_scale <- function(x){
    y =((100-0)/(max(x) -min(x))*(x-max(x))+100)
    sort(y)
    }

new_scale(importance_data)


#results
[1]   0  24  35  41  41  45  55  68 100

This will give you a uniform scale. And it does not mean that 22 in one scale is exactly the same as a 22 in another scale. But for relative comparison, any scale will do.

This will give you a standardized sense of the separation between the importance of each variable in its own model and you can evaluate them side-by-side more easily based on the relativity of the scaled importances.

Upvotes: 1

Related Questions