Reputation: 55
I'm trying to calculate weights of a dataset in R by using the FSelector package. The data is taken from this location.
data = read.csv("filepath/Indian Liver Patient Dataset (ILPD).csv")
names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "A/G Ratio", "Selector")
library(FSelector)
weights <- gain.ratio(Selector ~., data)
print(weights)
I can't calculate all of the weights. When I use the gain.ratio
function, the Age
weight is NaN. When I use chi.squared
function instead, both Age
and A/G Ratio
are zeroes. When I take first 200 elements from data
and calculate weights, only five of them are calculated corectly, and other are zeroes or NaN.
I tried deleting wrong elements from data by data <- na.omit(data)
but it didn't change the result.
How can I calculate weights correctly?
Below is an example of a weight print.
Age 0.0000000
Gender 0.1304229
TB 0.3281865
DB 0.3238010
Alkphos 0.2965842
Sgpt 0.2734633
Sgot 0.3120432
TP 0.2504747
ALB 0.3051724
A/G Ratio 0.0000000
Upvotes: 0
Views: 808
Reputation: 109242
Zero is a valid value for feature importance -- it means that the feature does not have any information with respect to the classification target. The NaNs are caused by a bug in FSelector that divides by 0 if a feature carries no information. I've fixed this in the development version.
The name "A/G Ratio" is not a valid R identifier and therefore causes problems with some of the methods. Below the code that fixes this and installs the development version of FSelector.
data = read.csv("Indian\ Liver\ Patient\ Dataset\ (ILPD).csv")
names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "AGRatio", "Selector")
library(devtools)
install_github("larskotthoff/fselector")
library(FSelector)
weights = gain.ratio(Selector~., data)
print(weights)
weights = chi.squared(Selector~., data)
print(weights)
Output:
attr_importance
Age 0.00000000
Gender 0.01539699
TB 0.09711392
DB 0.11547683
Alkphos 0.06593879
Sgpt 0.06566624
Sgot 0.07667241
TP 0.08836895
ALB 0.07766682
AGRatio 0.15403574
attr_importance
Age 0.0000000
Gender 0.1304229
TB 0.3281865
DB 0.3238010
Alkphos 0.2965842
Sgpt 0.2734633
Sgot 0.3120432
TP 0.2504747
ALB 0.3051724
AGRatio 0.0000000
Upvotes: 2