Reputation: 614
I have a vector of counts which I want to resample with replacement in R:
X350277 128
X193233 301
X514940 3715
X535375 760
X953855 50
X357046 236
X196664 460
X589071 898
X583656 670
X583117 1614
(Note the second column is counts, the first column is the object the counts represent)
From reading various documentation it seems easy to resample data where each row or column represents a single observation. But how do I do this when each row represents multiple observations summed together (as in a table of counts)?
Upvotes: 2
Views: 1725
Reputation: 13149
You can use weighted sampling (as user20650 also mentioned in the comments):
sample_weights <- dat$count/sum(dat$count)
mysample <- dat[sample(1:nrow(dat),1000,replace=T,prob=sample_weights),]
A less efficient approach - which might have its uses depending on what you want to do - is to turn your data to 'long' again:
dat_large <- dat[rep(1:nrow(dat),dat$count),]
#then sampling is easy
mysample <- dat_large[sample(1:nrow(dat_large),1000,replace=T),]
Upvotes: 3