user121
user121

Reputation: 325

party package for decision tree in R does not support character data type?

If one of the columns in my data frame is of data type character, I get the error below.

> library("party")
> r2 <- ctree(Sepal.Length ~ .,data=df)
Error in trafo(data = data, numeric_trafo = numeric_trafo, factor_trafo = factor_trafo,  : 
  data class character is not supported
> plot(r2)    
> sapply(df,class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
    "factor"     "factor"     "factor"  "character"     "factor" 

Sometimes, I also get this error

 Error in match.arg(type) : 
  'arg' should be one of “response”, “node”, “prob” > 
> sapply(df,class)
          AGE        GENDER          STAY      GRADE          XYNS        CHARGE 
    "integer"     "integer"      "factor"     "integer"     "integer"     "integer" 

How do I get around these?

Upvotes: 5

Views: 14117

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17168

The scale of the response variable and all explanatory variables is important for two aspects of the CTree algorithm: (1) The association tests that are carried out in each node to determine which variable should be used for splitting. (2) The selection of the best split point in a given explanatory variable.

The association tests always capture "correlation" or "lack of independence" between the response and each explanatory variable. And the type of correlation measure depends on the scale of the variables involved (see this post on Cross Validated: https://stats.stackexchange.com/questions/144143). The variables can be numeric (or integer), unordered categorical (i.e., factor), ordered categorical, or censored (Surv objects). Selecting an appropriate variable type for a given variable in a data frame is crucial to obtain meaningful results from the tree.

Similarly, the determination of the possible binary splits in a given variable depends crucially on the scale. And character is not a scale for which there is a standard way how to assess correlation or splits.

Upvotes: 1

Related Questions