Reputation: 27
I'm really new to R and I want to make a random forest. However I keep getting the same error-
Error in model.frame.default, lengths of variables differ.
I know this issue has been solved in another topic by constructing a formula from strings with as.
formula but I have really no idea how to do it. Can you help me please? Thank you.
#A vector that has random sample of training values (70% & 30% samples)
index = sample(2,nrow(df), replace = TRUE, prob=c(0.7,0.3))
#Training Date
training = df[index==1,]
#Testing data
testing = df[index==2,]
#Random forest model
RFM = randomForest(df$Rating~., df$Customer_type, data = training)
Upvotes: 1
Views: 817
Reputation: 329
Well what your error is, is that your independent variable is Rating
from the df
dataframe, but you selected data = training
. This means that your random forest should take data from 2 different dataframes, which isn't possible.
I guess that randomForest(Rating ~ Customer_type, data = training)
would work.
Upvotes: 2