Reputation: 2482
Am new to machine learning but, Am trying to build a prediction model, all my training set variables are categorical,
PREDICTOR_1 PREDICTOR_2 PREDICTOR_3
Found : 5 Best Match :2 Found, Supplier site: 5
No result : 2 Found :8 Found, Zone site : 1
Part NotFound:11 Not Found WDA:8 No Data Found :12
PREDICTOR_4 PREDICTOR_5 PREDICTOR_6
No result :11 Found with Different length: 1 High :10
Search begin: 7 No result :16 LOW : 4
Part Found With out Suffix : 1 No result: 4
PREDICTOR_7 PREDICTOR_8 PREDICTOR_9 RESULT
Direct_Match: 8 NO :8 Mpn Found within PCN: 3 Found :15
No result :10 YES :8 Mpn has no PCN :15 Not Found: 3
YES-REMOVE:2
I tried to use R's glm() function, but I keep getting
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
1- I want to know if it's possible to use only categorical data to train the model
2- What is the meaning of this error
Upvotes: 0
Views: 263
Reputation: 8044
1 this is possible to use only categorical data in the glm model
2 this error occures when one (or more) explanatory variable has a correlation with response variable equal to 1 or -1.I suggest you should first remove correlated exaplanatory variables (with other explanatory variables) and to remove such explanatory variables that have correlation equal to1 or -1 with response variable. This might be done with cor
function in R. I suggest Kendal correlation coefficient for categorical data. Try cor( data, method = "kendall")
Upvotes: 1