Reputation: 13
I need to analyse negative or positive text messages, and find out which words define a positive or negative text. At this point, I need to split the data between a test set and a training set. However, this happens:
library(caTools)
split = sample.split(smsSparse$sentiment, SplitRatio = .7)
# Error in sample.split(smsSparse$sentiment, SplitRatio = 0.7) :
# Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range
As suggested in this post, I changed "smsSparse$Negative = sms$Negative" to "smsSparse$Negative = sms$negative", but it didn't help. I aslo tried 7/10 and 0,7 instead of 0.7. Same result.
Can someone tell me why R thinks that 0.7 is not between 0 and 1?
Upvotes: 1
Views: 8619
Reputation: 77
sample.split actually works when the package caTool is installed and enabled. You can install it by
install.packages('caTools')
then enable it by
library('caTools')
After running the above lines, you can then do something like this
split = sample.split(smsSparse$sentiment, SplitRatio = 0.7)
If for instance your dataset is called dataset as an example
you can then do something like
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
Upvotes: 0
Reputation: 31
Looking at the Code of sample.split function as defined in R, you will see the following line of code
if (SplitRatio >= nSamp)
stop("Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range")
there could be 2 reasons for this error 1) the length of your data is less than the SplitRatio 2) first parameter to the split function is null.
Make sure you have data in the FirstParamter of that you are passing.
Upvotes: 0
Reputation: 424
set.seed(1000)
library(caTools)
split = sample.split(letters$isB, SplitRatio = 0.5)
isB should be the label of the Dependent variable, look up in your dataset that name.
Here you can find why this error is raised.
Upvotes: 2
Reputation: 126
As someone mentioned correctly, this is likely an assignment error, ex spelling error, or the column does not exist or is null, or even if the column based on which you are splitting (dependent variable) is not a factor, in which case you can convert it to one. To check quickly, you can see a summary of the smsSparse$sentiment and confirm.
Upvotes: 0
Reputation: 1
Check if smsSparse$sentiment
is rightly assigned. If there is any mistake happened during cbind
or any spelling mistakes, R throws an error like this.
Upvotes: 0
Reputation: 701
I have never used the function sample.split
before. However, normally I partition my data without using such a function. For example, say I want to partition the iris data set into a training and testing data set and I want the training to be about 70% of the size of the original data set. Then I can do this:
data(iris)
#Create a random sample of integers sample from 1 to nrow(iris)
samp <- sample(1:nrow(iris), size=round(0.7*nrow(iris)), replace=FALSE)
train <- iris[samp,] #Only takes rows that are in samp
test <- iris[-samp,] #Omits the rows that were in samp
The same can be done with a vector except the ,
is not necessary in [samp,]
or in [-samp,]
. I hope that helps. Otherwise, perhaps providing the first 6 entries smsSparse$sentiment might help people identify the problem.
Upvotes: 1