Diso daphney
Diso daphney

Reputation: 3

rpart function is overplotting or the desired partition is not achieved

  ID Ethnicity MaritalStatus EmploymentStatus type
1 10         5             3                1    3
2 24         1             2                2    1
3 30         1             1                3    4
4 35         2             2                2    3
5 40         5             1                3    4
6 57         1             2                4    1

This is my sample data. the table has almost 94000 rows. I tried the following command

m1 <- rpart(type ~ Ethnicity, MaritalStatus, EmploymentStatus, 
      data = table2, method = "anova")

My objective is to predict the 'type' based on the ethnicity, maritalstatus and emplymentstatus. All the variables were converted into factor datatype using as.factor() but the partition has taken place by ID, whereas I want the partition to happen by Ethnicity, then MaritalStatus and EmploymentStatus. I tried removing the ID column from the dataframe but the same problem exists.
I have attached an image of the results I get and also the corresponding rpart.plot result.
Is my datatype or any basic approach to the data wrong?
I am a beginner to machine learning. I also tried by changing datatype of ID to numeric.
Is there any way to set an hierarchy for partition?
Why is the graph just a line?

overplotted rpart plot

Upvotes: 0

Views: 231

Answers (1)

sebastianmm
sebastianmm

Reputation: 1176

There is an error in your formula. Predicting variables should be separated by +, instead of ,. In your call, all but the first predicting variable (ethnicity) are ignored.

m1 <- rpart(type ~ Ethnicity + MaritalStatus + EmploymentStatus, 
      data = table2, method = "anova")

Upvotes: 1

Related Questions