Johnny
Johnny

Reputation: 869

Need to index a list in R

I trained a Model with the following code:

set.seed(123)
xgbTree_model <- train(X_train,
                       y_train,
                       trControl = control,
                       method = "xgbTree",
                       metric = "RMSE",
                       preProcess = c("center","scale"),
                       importance = TRUE)

If I run this function:

varImp(xgbTree_model)

I am getting the following results:

> varImp(xgbTree_model)
xgbTree variable importance

  only 20 most important variables shown (out of 101)

                    Overall
OverallQual          100.00
GrLivArea             78.50
LotArea               30.31
TotalBsmtSF           27.49
Fireplaces            14.18
Age                    8.34
BsmtFinType1Unf        7.22
GarageYrBlt            5.73
CentralAirN            5.64
KitchenQualEx          5.42
KitchenQualTA          5.20
CentralAirY            4.20
BsmtQualTA             4.01
BsmtFinType1GLQ        3.84
NeighborhoodOldTown    1.96
Exterior1stBrkComm     1.88
BsmtFullBath           1.35
NeighborhoodIDOTRR     1.34
FoundationBrkTil       1.24
TotRmsAbvGrd           1.18
> 

I would like to perform a for loop to grab the first column of names to use it to delete the values of my existing table. I am trying to get rid of all the columns that are below the Overall value in the list. I tried to convert the list to a data.frame, but, I am losing the data that I need because this code adds its own column name when I convert, utilizing the following code:

corCol <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T))

Is there a way in R for me to grab the left column from the varImp(xgbTree_model) function with a for loop?

Thank you for your support and recommendation.

Upvotes: 0

Views: 100

Answers (1)

Jagge
Jagge

Reputation: 968

the varimp object is a bit annoying since the 'first column' is actually rownames. This has caused confusion for me in the past.

You can put it into the data.frame with the tibble function rownames_to_column()

varimps <- varImp(xgbTree_model)$importance
varimps <- varimps %>% 
   tibble::rownames_to_column()

and then it is easy to extract or filter whatever you want

For example, if you want to extract all the columns with a score above 10:

varimpsKeep <- varimps %>% dplyr::filter(Overall>10)

or extract the top n variables as a character vector:

varimp <- varimp %>%
  dplyr::arrange(desc(Overall)) 
my_wanted_variables <- varimp$rowname[1:n]

Upvotes: 1

Related Questions