roastbeeef
roastbeeef

Reputation: 1119

r rpart only working for integers and not factors? getting a tree with no depth

I'm having a few issues running a simple decision tree within R using rpart.

I can't post my actual data for an example because of confidentiality, but here's the structure. I've blanked out a load of bits just because I've got my tin foil hat on today.

data structure

I've run the most basic model to predict MIX based on MIX_BEFORE and LIFESTAGE and I don't get a tree out of the end of it. I've tried using rpart.control and specifying the minsplit, it makes no difference.

first tree results

Even when I add in a few more variables I still don't get a tree: enter image description here

Yet... the second I remove the factor variables and attempt to build a tree using an integer, it works fine:

enter image description here

Any ideas at all?

Upvotes: 0

Views: 914

Answers (1)

G5W
G5W

Reputation: 37641

Your data has a fairly strong class imbalance: 99% one class, 1% the other. So rpart can get 99% accuracy just by saying that everything is the majority class (which is what it is doing). Most variables will not be able to discriminate better than that, so you get trees with no branches like you did with the factor variables. Your _NBR variable happens to be more predictive for the small number of points with _NBR >= 7. But even your model that uses _NBR predicts almost all points are majority class. You may be able to get some help from This Cross Validated Post on how to deal with class imbalance.

Upvotes: 0

Related Questions