AdeeThyag
AdeeThyag

Reputation: 125

Error trying to apply Decision C4.5 algorithm using the RWeka package in R

I am trying to use the Decision Tree C 4.5 algorithm with 10 - Fold Cross Validation for Web Spam Detection. My dataset basically has 8944 observations and 36 variables after doing feature selection.

Here is my code:

#dividing the dataset into train and test
trainRowNumbers<-createDataPartition(final1$spam,p=0.7,list=FALSE)
#Create the training dataset
trainData<-final1[trainRowNumbers,]
#Create Test data
testData<-final1[-trainRowNumbers,]

#C4.5 using 10 fold cross validation
set.seed(1958)
train_control<-createFolds(trainData$spam,k=10)
C45Fit<-train(spam~.,method="J48",data=trainData,
              tuneLength=15,
              trControl=trainControl(
               method="cv",indexOut = train_control ))

This is the Error that I am getting:

C45Fit<-train(spam~.,method="J48",data=trainData,
               tuneLength=15,
               trControl=trainControl(
               method="cv",indexOut = train_control ))

Error in train(spam ~ ., method = "J48", data = trainData, tuneLength = 15, : unused arguments (method = "J48", data = trainData, tuneLength = 15, trControl = trainControl(method = "cv", indexOut = train_control))

I have got a couple of questions:

  1. How do I resolve this Error?

  2. How to set the tuneLength parameter?

Head of my Dataset:

> head(trainData)
  hostid                           host      HST_4     HST_6     HST_7     HST_8     HST_9    HST_10     HST_16
1      0         007cleaningagent.co.uk 0.03370787 1.9791304 0.1123596 0.1516854 0.2247191 0.2977528 0.07865169
2      1           0800.loan-line.co.uk 1.39539347 2.4222020 0.2284069 0.2610365 0.3531670 0.4529750 0.02879079
4      3 102belfast.boys-brigade.org.uk 0.29729730 1.1800000 0.2162162 0.3783784 0.5135135 0.5405405 0.21621622
5      4  10bristol.boys-brigade.org.uk 0.28804348 1.7745267 0.1141304 0.1847826 0.2608696 0.3750000 0.08152174
6      5  10enfield.boys-brigade.org.uk 0.00000000 0.8468468 0.0625000 0.1875000 0.1875000 0.3125000 0.06250000
8      8             13thcoventry.co.uk 0.05797101 2.1113074 0.2318841 0.3091787 0.3961353 0.5507246 0.09178744
      HST_17    HST_18 HST_20    HMG_29     HMG_40     HMG_41    HMG_42    AVG_50    AVG_51     AVG_55    AVG_57
1 0.15730337 0.2247191  0.070 0.2907760 0.02702703 0.07207207 0.1351351  32431.65  7.215054 0.02289305 0.2980171
2 0.05566219 0.1094050  0.075 0.0495162 0.10641628 0.17840376 0.2410016 150592.89  2.000000 0.49661240 0.1137439
4 0.37837838 0.4054054  0.040 0.2156130 0.03971119 0.11552347 0.1480144  16129.61  2.125000 0.12297815 0.2033877
5 0.13043478 0.2119565  0.075 0.0405612 0.08152174 0.13043478 0.2119565  28759.75  2.870968 0.19622331 0.0673372
6 0.18750000 0.2500000  0.005 0.1125400 0.02528090 0.12359551 0.1432584  70966.61  2.000000 0.03948338 0.2513755
8 0.14975845 0.2512077  0.095 0.1946150 0.04382470 0.10458167 0.1633466 109388.89 11.484940 0.03547817 0.1387366
       AVG_58   AVG_59     AVG_61     AVG_63    AVG_65    AVG_67     STD_77     STD_79       STD_80     STD_81
1 0.030079101 1.888686 0.04982536 0.07119317 0.1539772 0.2237475 0.02240051 0.04634758 0.0003248904 0.07644575
2 0.005874481 2.423238 0.14016213 0.17484142 0.2460647 0.3279534 0.03014901 0.05352347 0.0006170884 0.09449420
4 0.017285860 1.657795 0.08748573 0.14192639 0.2273218 0.2815660 0.03715705 0.07385004 0.0021174754 0.15725521
5 0.007008439 1.656472 0.10088409 0.17370255 0.2791502 0.3839271 0.03382564 0.07695898 0.0011314215 0.14290420
6 0.017145414 2.284363 0.09245673 0.14045514 0.2267635 0.2907555 0.02459505 0.06418522 0.0007756064 0.16533374
8 0.001818059 2.300361 0.17326186 0.25910768 0.3351511 0.4479340 0.05611160 0.07531329 0.0005475770 0.15796253
     STD_83      STD_84     STD_85     STD_87    STD_94   spam
1 0.1219990 0.001009964 0.04043011 0.04198925 0.3400028 normal
2 0.1539489 0.001734261 0.15000000 0.16000000 0.3147682 normal
4 0.2027374 0.006655953 0.06437500 0.06031250 0.7100778 normal
5 0.1925378 0.002708827 0.04258065 0.05290323 0.8195509 normal
6 0.2223814 0.005491305 0.09125000 0.08062500 1.2953592 normal
8 0.2366591 0.002588343 0.21698795 0.14774096 0.2882247 normal

Output of sessionInfo()

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2        ggthemes_3.5.0      randomForest_4.6-12 Metrics_0.1.3       RWeka_0.4-37        mlr_2.12.1         
 [7] ParamHelpers_1.10   rgeos_0.3-26        VIM_4.7.0           data.table_1.10.4-3 colorspace_1.3-2    mice_2.46.0        
[13] RANN_2.5.1          kernlab_0.9-25      mlbench_2.1-1       caret_6.0-79        ggplot2_2.2.1       lattice_0.20-35    
[19] dplyr_0.7.4        

loaded via a namespace (and not attached):
 [1] nlme_3.1-131       lubridate_1.7.3    bit64_0.9-7        dimRed_0.1.0       httr_1.3.1         backports_1.1.2    tools_3.4.0       
 [8] R6_2.2.2           rpart_4.1-11       DBI_0.8            lazyeval_0.2.1     nnet_7.3-12        withr_2.1.0        sp_1.2-7          
[15] tidyselect_0.2.3   mnormt_1.5-5       parallelMap_1.3    bit_1.1-12         curl_3.0           compiler_3.4.0     checkmate_1.8.5   
[22] scales_0.5.0       sfsmisc_1.1-1      DEoptimR_1.0-8     lmtest_0.9-35      psych_1.7.8        robustbase_0.92-8  stringr_1.2.0     
[29] foreign_0.8-67     rio_0.5.10         pkgconfig_2.0.1    RWekajars_3.9.2-1  rlang_0.2.0        readxl_1.0.0       ddalpha_1.3.1     
[36] BBmisc_1.11        bindr_0.1          zoo_1.8-0          ModelMetrics_1.1.0 car_3.0-0          magrittr_1.5       Matrix_1.2-12     
[43] Rcpp_0.12.14       munsell_0.4.3      abind_1.4-5        stringi_1.1.6      carData_3.0-1      MASS_7.3-47        plyr_1.8.4        
[50] recipes_0.1.1      parallel_3.4.0     forcats_0.3.0      haven_1.1.1        splines_3.4.0      pillar_1.2.1       boot_1.3-19       
[57] rjson_0.2.15       reshape2_1.4.2     codetools_0.2-15   stats4_3.4.0       CVST_0.2-1         glue_1.2.0         laeken_0.4.6      
[64] vcd_1.4-4          foreach_1.4.3      twitteR_1.1.9      cellranger_1.1.0   gtable_0.2.0       purrr_0.2.4        tidyr_0.7.2       
[71] assertthat_0.2.0   DRR_0.0.2          gower_0.1.2        openxlsx_4.0.17    prodlim_1.6.1      broom_0.4.3        e1071_1.6-8       
[78] class_7.3-14       survival_2.41-3    timeDate_3042.101  RcppRoll_0.2.2     tibble_1.4.2       rJava_0.9-9        iterators_1.0.8   
[85] lava_1.5.1         ipred_0.9-6       

Thanks for any suggestions provided in advance.

Upvotes: 1

Views: 761

Answers (1)

Weihuang Wong
Weihuang Wong

Reputation: 13118

I could replicate the error message in the following way:

library(RWeka)
library(caret)
library(mlr)
# Loading required package: ParamHelpers

# Attaching package: ‘mlr’

# The following object is masked from ‘package:caret’:

#     train
#dividing the dataset into train and test
trainRowNumbers <- createDataPartition(iris$Species, p = 0.7, list = FALSE)

#Create the training dataset
trainData <- iris[trainRowNumbers, ]
#Create Test data
testData <- iris[-trainRowNumbers, ]

#C4.5 using 10 fold cross validation
set.seed(1958)
train_control <- createFolds(trainData$Species, k = 10)
C45Fit <- train(Species~., method = "J48",data = trainData,
              tuneLength = 15,
              trControl = trainControl(
               method = "cv",indexOut = train_control ))
# Error in train(Species ~ ., method = "J48", data = trainData, tuneLength = 15,  : 
#   unused arguments (method = "J48", data = trainData, tuneLength = 15, trControl = trainControl(method = "cv", indexOut = train_control))

Notice the message The following object is masked from ‘package:caret’: train. If you load another package with a train function (e.g. mlr in this case) after you load caret, by default R will use the train from the most recently loaded package. (This is why I asked for sessionInfo(), to see what packages have been loaded. For the same reason, the replicable example should include the packages you loaded.) Instead of train from caret, R runs train from mlr (or some other package you loaded), which returns the error message.

The solution is to either load caret last, or explicitly call the train function from caret using caret::train(...).

Upvotes: 2

Related Questions