Reputation: 325
If one of the columns in my data frame is of data type character, I get the error below.
> library("party")
> r2 <- ctree(Sepal.Length ~ .,data=df)
Error in trafo(data = data, numeric_trafo = numeric_trafo, factor_trafo = factor_trafo, :
data class character is not supported
> plot(r2)
> sapply(df,class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"factor" "factor" "factor" "character" "factor"
Sometimes, I also get this error
Error in match.arg(type) :
'arg' should be one of “response”, “node”, “prob” >
> sapply(df,class)
AGE GENDER STAY GRADE XYNS CHARGE
"integer" "integer" "factor" "integer" "integer" "integer"
How do I get around these?
Upvotes: 5
Views: 14117
Reputation: 17168
The scale of the response variable and all explanatory variables is important for two aspects of the CTree algorithm: (1) The association tests that are carried out in each node to determine which variable should be used for splitting. (2) The selection of the best split point in a given explanatory variable.
The association tests always capture "correlation" or "lack of independence" between the response and each explanatory variable. And the type of correlation measure depends on the scale of the variables involved (see this post on Cross Validated: https://stats.stackexchange.com/questions/144143). The variables can be numeric (or integer), unordered categorical (i.e., factor), ordered categorical, or censored (Surv objects). Selecting an appropriate variable type for a given variable in a data frame is crucial to obtain meaningful results from the tree.
Similarly, the determination of the possible binary splits in a given variable depends crucially on the scale. And character
is not a scale for which there is a standard way how to assess correlation or splits.
Upvotes: 1