MLR package: generateFilterValuesData chi.squared and information.gain

Question

I am experimenting with the mlr package and would like to get chi-squared and information-gain values.

library(mlr)
library(FSelector)

data(PimaIndiansDiabetes)
indi <- sample(1:nrow(PimaIndiansDiabetes), 0.6 * nrow(PimaIndiansDiabetes))
train <- PimaIndiansDiabetes[indi,]

trainTask <- makeClassifTask(data = train, target = "diabetes", positive = "pos")

#Feature importance
im_feat <- generateFilterValuesData(trainTask, method = c("information.gain","chi.squared"))
plotFilterValues(im_feat)
im_feat

I am not sure about the consequences that there are two zeros in information.gain and chi.squared for the variables triceps and pressure. Does that indicate I should not use them for setting up a model (e.g. random forest)?

When I use

tbl <- table(train$triceps, train$diabetes)
chisq.test(tbl)

it gives me 60.473 for chi-squared. Why is it not 0? What's the difference between chisq and the chi-squared-method from mlr?

MLR package: generateFilterValuesData chi.squared and information.gain

Answers (1)

Related Questions