Reputation: 607
UPDATED QUESTION
I neglected to include one important aspect in my original question. The code provided by @Jthorpe works great with one column of STUFF
. However, depending on my data set, I will between 1 to 70 columns to randomly sample at once. In my updated example I have included 3 columns of STUFF
. So I need to group_by
SITE
and DATE
and then randomly sample
a single row from multiple columns of STUFF
at once. Please note how the RESULT
table retains the order of data across the columns of STUFF
. For example the first two rows in the RESULT
table are both 2,4,8 which corresponds to row 2 in the DATA
table. I hope this is clear. Thanks again.
Original question I need to pseudo-replicate a data set that could have multiple groups. Additionally, each group could have multiple factors. I have written code using for loops to subset the data set, then randomly sample the subset, and then reassemble the resampled data set into a new table. I would like to use some more elegant and flexible code. I have tried using dplyr (e.g, group_by and sample_n functions), but am having trouble getting the code to properly deal with the variable lengths in factors. I have attached an example data set and the desired result. Thanks in advance for any help.
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
STUFF3 = c(4, 8, 120, 160, 400, 800, 1200, 20000, 24000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(2, 2, 30, 30, 200, 300, 300, 6000, 5000),
STUFF2 = c(4, 4, 60, 60, 400, 600, 600, 12000, 10000),
STUFF3 = c(8, 8, 120, 120, 800, 1200, 1200, 24000, 20000))
Upvotes: 1
Views: 1596
Reputation: 92282
Here's a simple data.table
approach
library(data.table)
setDT(DATA)[, sample(STUFF, replace = TRUE), by = .(SITE, DATE)]
Upvotes: 1
Reputation: 10167
A dplyr solution:
RESULT <- DATA %>% group_by(SITE,DATE) %>% mutate(STUFF=sample(STUFF,replace= TRUE))
Upvotes: 4