Reputation: 3
ID Ethnicity MaritalStatus EmploymentStatus type
1 10 5 3 1 3
2 24 1 2 2 1
3 30 1 1 3 4
4 35 2 2 2 3
5 40 5 1 3 4
6 57 1 2 4 1
This is my sample data. the table has almost 94000 rows. I tried the following command
m1 <- rpart(type ~ Ethnicity, MaritalStatus, EmploymentStatus,
data = table2, method = "anova")
My objective is to predict the 'type' based on the ethnicity, maritalstatus and emplymentstatus.
All the variables were converted into factor datatype using as.factor()
but the partition has taken place by ID, whereas I want the partition to happen by Ethnicity
, then MaritalStatus
and EmploymentStatus
. I tried removing the ID
column from the dataframe but the same problem exists.
I have attached an image of the results I get and also the corresponding rpart.plot .
Is my datatype or any basic approach to the data wrong?
I am a beginner to machine learning. I also tried by changing datatype of ID
to numeric.
Is there any way to set an hierarchy for partition?
Why is the graph just a line?
Upvotes: 0
Views: 231
Reputation: 1176
There is an error in your formula. Predicting variables should be separated by +
, instead of ,
. In your call, all but the first predicting variable (ethnicity) are ignored.
m1 <- rpart(type ~ Ethnicity + MaritalStatus + EmploymentStatus,
data = table2, method = "anova")
Upvotes: 1