user288609
user288609

Reputation: 13065

regarding random number generation in a sequential sampling process

I have a data list, like

12345
23456
67891
-20000
200
600
20
...

Assume the size of this data set (i.e. lines of file) is N. I want to randomly draw m lines from this data file and output them into one file, and put the remaining N-m lines into another data file. I can random draw an index over m-iterations to get those m-lines. The issue that confuses me is that how to ensure the randomly drawn m lines are all different?

Is there a way to do that in R?

Upvotes: 1

Views: 360

Answers (2)

Jason Morgan
Jason Morgan

Reputation: 2330

I'm not entirely sure I understand the question, but here is one way to sample without replacement from a vector and then split that vector into two based on the sampling. This could be easily extended to other data types (e.g., data.frame).

## Example data vector.
X <- c(12345, 23456, 67891, -20000, 200, 600, 20)

## Length of data.
N <- length(X)

## Sample from the data indices, without replacement.
sampled.idx <- sample(1:N, 2, replace=FALSE)

## Select the sampled data elements.
(sampled <- X[sampled.idx])

## Select the non-sampled data elements.
(rest <- X[!(1:N %in% sampled.idx)])

## Update: A better way to do the last step.
## Thanks to @PLapointe's comment below.
(rest <- X[-sampled.idx])

Upvotes: 3

Hong Ooi
Hong Ooi

Reputation: 57697

Yes, use sample(N, size=m, replace=FALSE) to get a random sample of m out of N without replacement. Or just sample(N, m) since replace=FALSE is the default.

Upvotes: 4

Related Questions