JoF
JoF

Reputation: 49

Using the "sample" function in r to randomly select rows from a a data frame into a new dataframe, while also storing non-sampled rows

My objective for a school project is to randomly select a proportion of a dataset into a new subset, while also storing the non-sampled observations in another data frame using the "sample" function in base R.

Using the following code below yields a random sample of my made up data frame.

DATA <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14),
                 y=c(12, 6, 4, 23, 25, 8, 9),
                  z=c(2, 7, 8, 8, 15, 17, 29))



sample <- DATA[sample(1:nrow(DATA), floor(nrow(DATA)*0.7), replace = FALSE),]

However, i run into trouble when i want to extract the non-sampled observations aswell, which is where i run into troubles. Most resources i've come across suggest something like the code below,

training <- DATA[sample,]
testing <- DATA[-sample,]

but that option yields the error message

Error in xj[i] : invalid subscript type 'list'

Any help to solve the situation would be greatly appreciated.

Upvotes: 0

Views: 947

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

In your code, sample is a dataframe. You need sample to be index to make DATA[sample,] and DATA[-sample,] work.

sample <- sample(nrow(DATA), floor(nrow(DATA)*0.7))

training <- DATA[sample,]
testing <- DATA[-sample,]

Simplified the sample(..) call.

  • sample(1:nrow(DATA), ..) is same as sample(nrow(DATA), ..)
  • By default replace is FALSE in sample so no need to explicitly mention it.

Upvotes: 1

Related Questions