Reputation: 69
I am using the exact code for best first search from page 4 of this CRAN document (https://cran.r-project.org/web/packages/FSelector/FSelector.pdf), which uses the iris dataset. It works just fine on the iris dataset, but does not work on my ow ndata. My data has 37 predictor variables (both numerical and categorical) with the 38th column the Class prediction.
I'm getting the error:
Error in predict.rpart(tree, test, type = "c") :
Invalid prediction for "rpart" object
Which I think comes from this line:
error.rate = sum(test$Class != predict(tree, test, type="c")) / nrow(test)
I've tried the debug and traceback but I'm not understanding why this error is occurring (and like I said, it's not reproducible with iris data).
Here's some of my data so you can see What I'm working with:
> head(data)
Numeric Binary Binary.1 Categorical Binary.2 Numeric.1 Numeric.2 Numeric.3 Numeric.4 Numeric.5 Numeric.6
1 42 1 0 1 0 27.38953 38.93202 27.09122 38.15687 9.798653 18.57313
2 43 1 0 3 0 76.34071 75.18190 73.66722 72.39449 23.546124 54.29957
3 67 0 0 1 0 485.87158 287.35052 471.58863 281.55261 73.454080 389.40092
4 49 0 0 3 0 200.83924 171.77136 164.33999 137.13165 36.525225 122.74080
5 42 1 1 2 0 421.56508 243.05138 388.66823 221.17644 57.803488 285.72923
6 48 1 1 2 0 69.48605 68.86291 67.57764 66.68408 16.661986 43.27868
Numeric.7 Numeric.8 Numeric.9 Numeric.10 Numeric.11 Numeric.12 Numeric.13 Numeric.14 Numeric.15 Numeric.16
1 1.9410 1.6244 1.4063 3.761285 11.07121 12.00510 1.631108 2.061702 0.7911462 1.0196401
2 2.7874 2.4975 1.8621 4.519124 18.09848 15.46028 2.069787 2.650712 0.7808421 0.9650938
3 4.9782 4.5829 4.0747 10.165202 24.66558 18.26303 2.266640 3.504340 0.6468095 1.8816444
4 3.4169 3.0646 2.7983 7.275817 15.15534 13.93672 2.085589 2.309878 0.9028999 1.6726948
5 5.2302 3.7912 3.4401 7.123413 59.64406 28.71171 3.311343 5.645815 0.5865128 0.8572746
6 2.9730 2.2918 1.5164 4.541603 26.81567 18.67885 2.637904 3.523510 0.7486581 0.7908798
Numeric.17 Numeric.18 Numeric.19 Numeric.20 Categorical.1 Numeric.21 Numeric.22 Numeric.23 Numeric.24
1 2.145868 1.752803 64.91618 41.645192 1 9.703708 1.116614 0.09654643 4.0075897
2 2.336676 1.933997 19.93420 11.824950 3 31.512059 1.360054 0.03559176 0.5806225
3 5.473179 1.857276 44.22981 33.698516 1 8.498998 1.067967 0.04122081 0.7760942
4 3.394066 2.143688 10.61420 29.636776 3 39.734071 1.549718 0.04577881 0.3102006
5 1.744118 4.084250 38.28577 87.214615 2 59.519129 2.132184 0.16334461 0.3529899
6 1.124962 4.037118 58.37065 3.894945 2 64.895248 2.190225 0.13461692 0.2672686
Numeric.25 Numeric.26 Numeric.27 Numeric.28 Numeric.29 Numeric.30 Numeric.31 Class
1 0.065523088 1.012919 1.331637 0.18721221 645.60854 144.49088 20.356321 FALSE
2 0.030128214 1.182271 1.633734 0.10035377 206.18575 142.63844 24.376264 FALSE
3 0.005638842 0.802835 1.172351 0.07512149 81.98983 91.44951 18.949937 FALSE
4 0.061873262 1.323395 1.733104 0.12725994 51.14379 113.19654 28.529134 FALSE
5 0.925931194 1.646710 3.096853 0.39408020 151.65062 103.64733 6.769099 FALSE
6 0.548181302 1.767779 2.547693 0.34173633 46.10354 111.04652 9.658817 FALSE
Upvotes: 2
Views: 23802
Reputation: 341
Since I'm not very familiar with the rpart-package yet, I might be wrong but it works for me:
Try using type = "vector"
instead of type = "c"
. Your variable Class
is logical so the rpart
-function should have generated a regression tree, not a classification tree. The documentation of predict.rpart
states, that the types class
and prob
are only meant for classification trees.
With the following code you can get your predicted classes:
your_threshold <- 0.5
predicted_classes <- predict(tree, test, type = "vector") >= your_threshold
Alternatively you can factorize your variable Class
before training the tree. rpart
will then build a classification tree:
data$Class <- factor(data$Class)
tree <- rpart(Class ~ ., data)
predicted_classes <- predict(tree, test, type = "class") # or type = "c" if you prefer
Your choice ;) Hope that helps!
Upvotes: 12