Reputation: 13065
I have a data list, like
12345
23456
67891
-20000
200
600
20
...
Assume the size of this data set (i.e. lines of file) is N. I want to randomly draw m lines from this data file and output them into one file, and put the remaining N-m lines into another data file. I can random draw an index over m-iterations to get those m-lines. The issue that confuses me is that how to ensure the randomly drawn m lines are all different?
Is there a way to do that in R?
Upvotes: 1
Views: 360
Reputation: 2330
I'm not entirely sure I understand the question, but here is one way to sample without replacement from a vector and then split that vector into two based on the sampling. This could be easily extended to other data types (e.g., data.frame
).
## Example data vector.
X <- c(12345, 23456, 67891, -20000, 200, 600, 20)
## Length of data.
N <- length(X)
## Sample from the data indices, without replacement.
sampled.idx <- sample(1:N, 2, replace=FALSE)
## Select the sampled data elements.
(sampled <- X[sampled.idx])
## Select the non-sampled data elements.
(rest <- X[!(1:N %in% sampled.idx)])
## Update: A better way to do the last step.
## Thanks to @PLapointe's comment below.
(rest <- X[-sampled.idx])
Upvotes: 3
Reputation: 57697
Yes, use sample(N, size=m, replace=FALSE)
to get a random sample of m out of N without replacement. Or just sample(N, m)
since replace=FALSE
is the default.
Upvotes: 4