Vesuccio
Vesuccio

Reputation: 607

Need to randomly sample a data set with multiple groups each with multiple factors

UPDATED QUESTION I neglected to include one important aspect in my original question. The code provided by @Jthorpe works great with one column of STUFF. However, depending on my data set, I will between 1 to 70 columns to randomly sample at once. In my updated example I have included 3 columns of STUFF. So I need to group_by SITE and DATE and then randomly sample a single row from multiple columns of STUFF at once. Please note how the RESULT table retains the order of data across the columns of STUFF. For example the first two rows in the RESULT table are both 2,4,8 which corresponds to row 2 in the DATA table. I hope this is clear. Thanks again.

Original question I need to pseudo-replicate a data set that could have multiple groups. Additionally, each group could have multiple factors. I have written code using for loops to subset the data set, then randomly sample the subset, and then reassemble the resampled data set into a new table. I would like to use some more elegant and flexible code. I have tried using dplyr (e.g, group_by and sample_n functions), but am having trouble getting the code to properly deal with the variable lengths in factors. I have attached an example data set and the desired result. Thanks in advance for any help.

DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                  DATE = c("1","1","2","2","3","3","3","4","4"), 
                  STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                  STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
                  STUFF3 = c(4, 8, 120, 160, 400, 800, 1200, 20000, 24000))



 RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                    DATE = c("1","1","2","2","3","3","3","4","4"), 
                    STUFF = c(2, 2, 30, 30, 200, 300, 300, 6000, 5000),
                    STUFF2 = c(4, 4, 60, 60, 400, 600, 600, 12000, 10000),
                    STUFF3 = c(8, 8, 120, 120, 800, 1200, 1200, 24000, 20000))

Upvotes: 1

Views: 1596

Answers (2)

David Arenburg
David Arenburg

Reputation: 92282

Here's a simple data.table approach

library(data.table)
setDT(DATA)[, sample(STUFF, replace = TRUE), by = .(SITE, DATE)]

Upvotes: 1

Jthorpe
Jthorpe

Reputation: 10167

A dplyr solution:

RESULT <- DATA %>% group_by(SITE,DATE) %>% mutate(STUFF=sample(STUFF,replace= TRUE))

Upvotes: 4

Related Questions