Joris de Jong
Joris de Jong

Reputation: 13

Error in sample.split in R, 'SplitRatio' parameter has to be i [0, 1]

I need to analyse negative or positive text messages, and find out which words define a positive or negative text. At this point, I need to split the data between a test set and a training set. However, this happens:

library(caTools)
split = sample.split(smsSparse$sentiment, SplitRatio = .7)
# Error in sample.split(smsSparse$sentiment, SplitRatio = 0.7) : 
#   Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range

As suggested in this post, I changed "smsSparse$Negative = sms$Negative" to "smsSparse$Negative = sms$negative", but it didn't help. I aslo tried 7/10 and 0,7 instead of 0.7. Same result.

Can someone tell me why R thinks that 0.7 is not between 0 and 1?

Upvotes: 1

Views: 8619

Answers (6)

Anthony
Anthony

Reputation: 77

sample.split actually works when the package caTool is installed and enabled. You can install it by

install.packages('caTools')

then enable it by

library('caTools')

After running the above lines, you can then do something like this

split = sample.split(smsSparse$sentiment, SplitRatio = 0.7)

If for instance your dataset is called dataset as an example

you can then do something like

training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

Upvotes: 0

Abhinaw Sharma
Abhinaw Sharma

Reputation: 31

Looking at the Code of sample.split function as defined in R, you will see the following line of code

if (SplitRatio >= nSamp)
    stop("Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range")

there could be 2 reasons for this error 1) the length of your data is less than the SplitRatio 2) first parameter to the split function is null.

Make sure you have data in the FirstParamter of that you are passing.

Upvotes: 0

Anant
Anant

Reputation: 424

set.seed(1000) library(caTools) split = sample.split(letters$isB, SplitRatio = 0.5)

isB should be the label of the Dependent variable, look up in your dataset that name.

Here you can find why this error is raised.

Upvotes: 2

Aditya Arora
Aditya Arora

Reputation: 126

As someone mentioned correctly, this is likely an assignment error, ex spelling error, or the column does not exist or is null, or even if the column based on which you are splitting (dependent variable) is not a factor, in which case you can convert it to one. To check quickly, you can see a summary of the smsSparse$sentiment and confirm.

Upvotes: 0

Manoj Subramanyam
Manoj Subramanyam

Reputation: 1

Check if smsSparse$sentiment is rightly assigned. If there is any mistake happened during cbind or any spelling mistakes, R throws an error like this.

Upvotes: 0

justin1.618
justin1.618

Reputation: 701

I have never used the function sample.split before. However, normally I partition my data without using such a function. For example, say I want to partition the iris data set into a training and testing data set and I want the training to be about 70% of the size of the original data set. Then I can do this:

data(iris)

#Create a random sample of integers sample from 1 to nrow(iris)
samp <- sample(1:nrow(iris), size=round(0.7*nrow(iris)), replace=FALSE)

train <- iris[samp,]  #Only takes rows that are in samp
test <- iris[-samp,] #Omits the rows that were in samp

The same can be done with a vector except the , is not necessary in [samp,] or in [-samp,]. I hope that helps. Otherwise, perhaps providing the first 6 entries smsSparse$sentiment might help people identify the problem.

Upvotes: 1

Related Questions