Reputation: 1
I am a complete noob, so this is a fairly basic question. I'm looking to build a decision tree-based classification model in SAS.
I'm unable to embed pictures in my question nor am I able to attach images, but I have a dataset that I am working with.
Here is a link preview of my data set: I'm trying to build this decision-tree using the hpsplit procedure in SAS, but it's not working. I think it's because:
(1) I am not using all of the categorical variables
(2) I have missing values in the "node-caps" column: the available options are yes, no, and ? - I think I should be using the "ASSIGNMISSING" procedure, but not sure. See image:
Here is my current code:
proc hpsplit data=bcancer seed=1;
class class;
model class = Age Menopause tumor_size inv_nodes node_caps deg_malig breast breast_quad irradiat;
grow entropy;
prune costcomplexity;
run;
I think that I should be:
(1) Adding more variables to the second row (as they are categorical)
(2) Adding the "ASSIGNMISSING" procedure to account for missing variables in one column. See link: https://i.sstatic.net/ltR3S.png
NOTE: The ASSIGNMISSING= option has not been specified. Because of this, all observations with
missing values in the explanatory variables will be excluded from tree construction.
ERROR: Character variable appeared on the MODEL statement without appearing on a CLASS statement.
ERROR: Unable to create a usable predictor variable set.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE HPSPLIT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
For reference, this is the error I see in the log. Any help would be greatly appreciated!
Upvotes: 0
Views: 1471
Reputation: 21274
Categorical variables need to be in the CLASS statement. It looks like many of your variables are categorical and need to be in the CLASS statement. Continuous variables should be numeric, but I don't see any in your data.
Because the categorical variable node_caps is character ?
will be assigned as a level, not as missing. Do you want them coded as missing or included as their own level of that variable?
proc hpsplit data=bcancer seed=1;
class age menopause tutor_size inv_nodes node_caps deg_malig breast breast_quad irradiat;
model class = Age Menopause tumor_size inv_nodes node_caps deg_malig breast breast_quad irradiat;
grow entropy;
prune costcomplexity;
run;
Upvotes: 1