Reputation: 49
My objective for a school project is to randomly select a proportion of a dataset into a new subset, while also storing the non-sampled observations in another data frame using the "sample" function in base R.
Using the following code below yields a random sample of my made up data frame.
DATA <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14),
y=c(12, 6, 4, 23, 25, 8, 9),
z=c(2, 7, 8, 8, 15, 17, 29))
sample <- DATA[sample(1:nrow(DATA), floor(nrow(DATA)*0.7), replace = FALSE),]
However, i run into trouble when i want to extract the non-sampled observations aswell, which is where i run into troubles. Most resources i've come across suggest something like the code below,
training <- DATA[sample,]
testing <- DATA[-sample,]
but that option yields the error message
Error in xj[i] : invalid subscript type 'list'
Any help to solve the situation would be greatly appreciated.
Upvotes: 0
Views: 947
Reputation: 388817
In your code, sample
is a dataframe. You need sample
to be index to make DATA[sample,]
and DATA[-sample,]
work.
sample <- sample(nrow(DATA), floor(nrow(DATA)*0.7))
training <- DATA[sample,]
testing <- DATA[-sample,]
Simplified the sample(..)
call.
sample(1:nrow(DATA), ..)
is same as sample(nrow(DATA), ..)
replace
is FALSE
in sample
so no need to explicitly mention it.Upvotes: 1