Actual characters used (0) when using rpart() in R

Question

I'm trying to make a model to determine if a review is positive or negative. I've loaded up all my data, tokenized it into a dataframe with the first column being a factor if it's recommended or not.

> str(reviewtokensdf)
'data.frame':   500 obs. of  270 variables:
 $ recommend       : Factor w/ 2 levels "0","1": 1 2 2 1 2 2 1 2 2 2 ...
 $ made            : num  3 0 0 0 0 0 1 0 0 0 ...
 $ site            : num  1 1 0 0 0 0 0 0 0 0 ...
 $ use             : num  1 0 0 0 1 0 0 0 0 0 ...
 $ one             : num  2 1 0 0 0 0 0 0 0 0 ...
 $ will            : num  1 1 1 0 0 0 0 0 0 0 ...
 $ make            : num  2 1 0 0 1 0 0 0 0 1 ...
 $ book            : num  6 0 0 0 3 0 0 0 0 0 ...
 $ place           : num  3 0 0 0 0 1 0 0 0 0 ...
 $ stay            : num  1 0 0 0 0 0 0 0 0 0 ...
 $ night           : num  1 0 0 2 0 0 0 0 0 1 ...
 $ arriv           : num  1 0 0 0 1 0 0 0 0 0 ...
 $ small           : num  1 0 0 0 0 0 0 0 0 0 ...
 $ floor           : num  1 0 0 3 0 0 1 0 0 0 ...

Now i've been using a smaller subset (n=500) just for testing purposes but that shouldn't be a problem. I've extensively been using this ( https://medium.com/analytics-vidhya/customer-review-analytics-using-text-mining-cd1e17d6ee4e) tutorial for guidence but i keep running into this problem:

When i use this code:

tree = rpart(formula = recommend ~ ., data = reviewtokensdf,  method="class",control = rpart.control(minsplit = 200,  minbucket = 30, cp = 0.0001))
printcp(tree)

i expect to see at least some words in the " variables actually used in tree construction: section but it keeps staying on 0 and i have no clue why.

    Classification tree:
    rpart(formula = recommend ~ ., data = reviewtokensdf, method = "class", 
        control = rpart.control(minsplit = 200, minbucket = 30, cp = 1e-04))

    Variables actually used in tree construction:
    character(0)

    Root node error: 40/500 = 0.08

    n= 500 

      CP nsplit rel error xerror xstd
    1  0      0         1      0    0

i tried breaking down the rpart arguments to just the basics (so taking off the rpart.control etc) no dice. I tried things like reviewtokensdf$recommended in the formula field, same result.

When i run the example data from the guide i mentioned, it's all fine and dandy. Yet i can't see a difference.

Actual characters used (0) when using rpart() in R

Answers (1)

Related Questions