Exorcismus
Exorcismus

Reputation: 2482

Building prediction model using categorical data in R

Am new to machine learning but, Am trying to build a prediction model, all my training set variables are categorical,

PREDICTOR_1     PREDICTOR_2              PREDICTOR_3
 Found        : 5    Best Match   :2        Found, Supplier site: 5   
 No result    : 2    Found        :8        Found, Zone site    : 1   
 Part NotFound:11    Not Found WDA:8        No Data Found       :12   
    PREDICTOR_4                       PREDICTOR_5   PREDICTOR_6
 No result   :11      Found with Different length: 1   High     :10    
 Search begin: 7      No result                  :16   LOW      : 4    
                      Part Found With out Suffix : 1   No result: 4    
     PREDICTOR_7   PREDICTOR_8                PREDICTOR_9       RESULT  
 Direct_Match: 8      NO        :8      Mpn Found within PCN: 3   Found    :15  
 No result   :10      YES       :8      Mpn has no PCN      :15   Not Found: 3  
                      YES-REMOVE:2 

I tried to use R's glm() function, but I keep getting

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

1- I want to know if it's possible to use only categorical data to train the model
2- What is the meaning of this error

Upvotes: 0

Views: 263

Answers (1)

Marcin
Marcin

Reputation: 8044

1 this is possible to use only categorical data in the glm model

2 this error occures when one (or more) explanatory variable has a correlation with response variable equal to 1 or -1.I suggest you should first remove correlated exaplanatory variables (with other explanatory variables) and to remove such explanatory variables that have correlation equal to1 or -1 with response variable. This might be done with cor function in R. I suggest Kendal correlation coefficient for categorical data. Try cor( data, method = "kendall")

Upvotes: 1

Related Questions