Reputation: 87
I am trying RF for the 1st time. I am trying to predict the genre of the game based on the factors
data <- read.csv("appstore_games.csv")
data <- data %>% drop_na()
data <- data %>% select(Average.User.Rating, User.Rating.Count, Price, Age.Rating, Genres)
data <- data %>% separate(Genres, c("Main Genre","Genre1","Genre2","Genre3"), extra = "drop" )
data1 <- data %>% select(Genre1 , Average.User.Rating, User.Rating.Count, Price )
str(data1)
data1$Genre1 <- as.factor(data1$Genre1)
set.seed(123)
sample <- sample(2 , nrow(data1),replace = TRUE, prob = c(0.7,0.3))
train_data <- data1[sample == 1,]
test_data <- data1[sample == 2,]
library(randomForest)
set.seed(1)
rf <- randomForest(train_data$Genre1 ~., data = train_data , proximity = TRUE, ntree = 200, importance = TRUE)
It shows error at this point Error in randomForest.default(m, y, ...) : Can't have empty classes in y.
Can I know what is wrong here? Thanks The genre has names such as Strategy, Entertainment, etc
Upvotes: 1
Views: 271
Reputation: 21
train_data <- droplevels(train_data) Try using this before you pass data to the model
Upvotes: 2
Reputation: 174
I am not completely sure, but I think that could happen if not all different levels of your Y is represented in the train data. Maybe you check this.
My other idea is that one of your classes in Y is "None".
Upvotes: 2