Reputation: 11
I have a dataset which has 5 columns and 24347 observations. I want to generate 10 random datasets from the master dataset. I am using the following code, but I am unable to generate mutiple datasets.
iterations =10
variables = 5
output_i <- matrix(ncol=variables, nrow=iterations)
for(i in 1:iterations){
output_i <- newdata[sample(nrow(newdata), 100),]
}
Upvotes: 1
Views: 3488
Reputation: 31452
A more "R" way to do this is to ditch the for
loop in favour of lapply
sample_data_list <- lapply(1:iterations, function(i) newdata[sample(1:nrow(newdata), 100),])
Upvotes: 2
Reputation: 5689
I think your best bet is to make a list of data frames rather than your approach using a for loop. We can do this using replicate()
which uses lapply()
.
First, let's create a dummy data frame df
that mimics your data, with 5 columns and 24,347 observations:
df<-data.frame(a = rnorm(24347),
b = rnorm(24347),
c = rnorm(24347),
d = rnorm(24347),
e = rnorm(24347))
Next, set the number of iterations you want, and how big each subset sample should be:
iterations <- 10
subset_size <- 100
Finally, create a list of sampled data frames:
samples_list = replicate(n = iterations,
expr = {df[sample(nrow(df), subset_size),]},
simplify = F)
This repeats the expression df[sample(nrow(df), subset_size),]
for however many iterations you desire and places each newly created data frame in the list samples_list
.
You access the data frames just like you would access any other list element:
samples_list[[1]]
Just remember the double brackets around your data frame element, or else it will not work. From here, you can access any particular row or column as normal:
samples_list[[dataframe]][row,column]
If you need more info on lists
, I would head over to this post: https://stackoverflow.com/a/24376207/6535514
Upvotes: 0
Reputation: 1538
You cannot iterate over i and then write a variable called output_i and expect it to change the variable name over iterations.
I suggest that you use a list to hold the output_i objects.
See code below:
iterations =10
newdata <- matrix(1:(5*24347),ncol=5, nrow=24347)
sample_data_list <- list()
for(i in 1:iterations){
sample_data_list[[i]] <- newdata[sample(1:nrow(newdata), 100),]
}
This will generate a list of 10 different samples of 100 observations from the original data.
> str(sample_data_list)
List of 10
$ : int [1:100, 1:5] 8788 21165 14054 2762 10288 3319 8175 6494 17935 2865 ...
$ : int [1:100, 1:5] 16351 15621 5455 23679 22460 4283 15251 1008 21474 19218 ...
$ : int [1:100, 1:5] 16814 21784 9937 5673 8699 7887 23739 3382 429 2550 ...
$ : int [1:100, 1:5] 21479 12247 8417 7963 14565 4513 3461 10996 16986 8029 ...
$ : int [1:100, 1:5] 22685 18552 21278 17930 954 9223 17894 343 4677 15571 ...
$ : int [1:100, 1:5] 13486 3516 5155 1617 16324 15705 12960 12154 20426 1124 ...
$ : int [1:100, 1:5] 10118 56 2950 12234 953 9479 11098 14272 24303 7672 ...
$ : int [1:100, 1:5] 1621 12303 14894 718 20877 1682 16234 7019 7926 11954 ...
$ : int [1:100, 1:5] 915 2957 14657 21297 13652 6750 11996 3621 23321 21818 ...
$ : int [1:100, 1:5] 11654 20698 5739 6693 6840 10384 20068 10571 18353 5123 ...
Upvotes: 0
Reputation: 680
Use a list instead. In that example you are overwriting output_i on every pass of the loop.
output <- list()
for(i in 1:iterations){
output[[i]] <- newdata[sample(nrow(newdata), 100),]
}
Your first sample will be the first element of the list...
Upvotes: 4