Reputation: 21
I am trying to code a two class classification DT problem that I used SAS EM before. But trying to do it in Sklearn. The target variable is a two class categorical variable. But there are a few continuous independent variables. In SAS I could specify the "Maximum Number of Branches" for each split. So when it is set to 4, some leaf will split into 2 and some in 4 (especially for continuous variables). I could not find an equivalent parameter in sklearn. Looked at "max_leaf-nodes". But that controls the total number of "leaf" nodes of the entire tree. I am sure some of you probably has faced the same situation and already found a solution. Please help/share. I will really appreciate it.
Upvotes: 2
Views: 2311
Reputation: 1804
I don't think this option is available in sklearn, You will find this Post very useful for your Classification DT; as it lists all the options you have available.
I would recommend creating Bins for your continues variables; this way you force the branches to be the number of bins you have.
Example: For continuous variable COl1 has values between 1-100; you can create a 4 bins 1-25, 26-50 , 51-75, 76-100. or you can create the bins bases on the median.
Upvotes: 1