Prathit
Prathit

Reputation: 87

Random Forest Tree for classification

str(data1)I am trying RF for the 1st time. I am trying to predict the genre of the game based on the factors

data <- read.csv("appstore_games.csv")
data <- data %>% drop_na()
data <- data %>% select(Average.User.Rating, User.Rating.Count, Price, Age.Rating, Genres)
data <- data %>% separate(Genres, c("Main Genre","Genre1","Genre2","Genre3"), extra = "drop" )
data1 <- data %>% select(Genre1 , Average.User.Rating, User.Rating.Count, Price )
str(data1)
data1$Genre1 <- as.factor(data1$Genre1)
set.seed(123)
sample <- sample(2 , nrow(data1),replace = TRUE, prob = c(0.7,0.3))
train_data <- data1[sample == 1,]
test_data <- data1[sample == 2,]
library(randomForest)
set.seed(1)
rf <- randomForest(train_data$Genre1 ~., data = train_data , proximity = TRUE, ntree = 200, importance = TRUE)

It shows error at this point Error in randomForest.default(m, y, ...) : Can't have empty classes in y.

Can I know what is wrong here? Thanks The genre has names such as Strategy, Entertainment, etc

Upvotes: 1

Views: 271

Answers (2)

Fowzan
Fowzan

Reputation: 21

train_data <- droplevels(train_data) Try using this before you pass data to the model

Upvotes: 2

padul
padul

Reputation: 174

I am not completely sure, but I think that could happen if not all different levels of your Y is represented in the train data. Maybe you check this.

My other idea is that one of your classes in Y is "None".

Upvotes: 2

Related Questions