britney
britney

Reputation: 69

How do generate a batch of tables with randomly sampled rows from a larger table?

I have a table containing 6,800,000 rows and 35 columns. I want to generate a batch of 34 tables containing 200,000 rows each. Previously, I've tried:

library(data.table)
table <- fread("dataset.preimp") 
table_1 <- table[sample(nrow(table), size = 200000, replace = FALSE) , ]

This generates a table with 200000 randomly sampled rows. If I want to make a second table, the excludes the rows included in this first table, also with 200000 randomly sampled rows, how would I do that?

Upvotes: 2

Views: 104

Answers (1)

Axeman
Axeman

Reputation: 35297

Split the table into a list of 34 tables, with each row appearing in one table:

table_ids <- sample(rep(1:4, each = 8))
split(mtcars, table_ids)

For your example:

table_ids <- sample(rep(1:34, each = 200000))
table_list <- split(table, table_ids)

Upvotes: 3

Related Questions