Reputation: 3192
I have a large data set which has 100k data fields. When I try str() or view the full data no glitched occurs, but when I run rpart on the training set it takes sometime and after about 3-4 minutes it shows up the following error,
Error: Unable to establish connection with R session
My script looks like below:
# Decision tree
library(rpart)
library(rattle)
library(party)
train_set <- read.table('my_sample_trainset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
test_set <- read.table('my_sample_testset.csv', header=TRUE, sep=',', stringsAsFactors=FALSE)
my_trained_tree <- rpart(Route ~ Bus_Id + week_days + time_slot, data=train_set, method="class")
# Error occurs on/after this line
my_prediction <- predict(my_trained_tree, test_set, type = "class")
my_solution <- data.frame(Route = my_prediction)
write.csv(my_solution, file = "solution.csv", row.names = FALSE)
Am I missing a library? or does this happen because of the big data set (6.5MB)
Further, I am using rStudio version 0.99.447 on a Mac OS X Yosemite
Upvotes: 2
Views: 1409
Reputation: 349
That message means that R is still calculating the results. If you open Activity Monitor and sort by CPU usage on the CPU tab, you should see that rsession is using 100% of a CPU. So you can just click "ok" on that message and allow R to keep computing.
I wish there were a workaround though, this issue is plaguing me as we speak!
Upvotes: 1