Reputation: 1119
I'm having a few issues running a simple decision tree within R using rpart.
I can't post my actual data for an example because of confidentiality, but here's the structure. I've blanked out a load of bits just because I've got my tin foil hat on today.
I've run the most basic model to predict MIX based on MIX_BEFORE and LIFESTAGE and I don't get a tree out of the end of it. I've tried using rpart.control and specifying the minsplit, it makes no difference.
Even when I add in a few more variables I still don't get a tree:
Yet... the second I remove the factor variables and attempt to build a tree using an integer, it works fine:
Any ideas at all?
Upvotes: 0
Views: 914
Reputation: 37641
Your data has a fairly strong class imbalance: 99% one class, 1% the other. So rpart
can get 99% accuracy just by saying that everything is the majority class (which is what it is doing). Most variables will not be able to discriminate better than that, so you get trees with no branches like you did with the factor variables. Your _NBR variable happens to be more predictive for the small number of points with _NBR >= 7. But even your model that uses _NBR predicts almost all points are majority class. You may be able to get some help from This Cross Validated Post on how to deal with class imbalance.
Upvotes: 0