jds
jds

Reputation: 614

How to Bootstrap Resample Count Data in R

I have a vector of counts which I want to resample with replacement in R:

X350277  128
X193233  301
X514940 3715
X535375  760
X953855   50
X357046  236
X196664  460
X589071  898
X583656  670
X583117 1614

(Note the second column is counts, the first column is the object the counts represent)

From reading various documentation it seems easy to resample data where each row or column represents a single observation. But how do I do this when each row represents multiple observations summed together (as in a table of counts)?

Upvotes: 2

Views: 1725

Answers (1)

Heroka
Heroka

Reputation: 13149

You can use weighted sampling (as user20650 also mentioned in the comments):

sample_weights <- dat$count/sum(dat$count)
mysample <- dat[sample(1:nrow(dat),1000,replace=T,prob=sample_weights),]

A less efficient approach - which might have its uses depending on what you want to do - is to turn your data to 'long' again:

dat_large <- dat[rep(1:nrow(dat),dat$count),]

#then sampling is easy
mysample <- dat_large[sample(1:nrow(dat_large),1000,replace=T),]

Upvotes: 3

Related Questions