Resampling with entries with same code (ID)

Question

In R, I'm trying to resample my dataset.

The database A includes some codes in the first column (integer) and characteristics of each row as follows:

A <- as.matrix(cbind(floor(runif(1000, 1,101)), matrix(rexp(20000, rate=.1), ncol=20) ))

Some codes are repeated in the first column.

I want to resample randomly codes from the first column and create a new matrix or dataframe such that for each entry in the resampled code vector it gives me the right hand side. If there are more vectors with the same resampled code it should include both. Also, if I'm resampling the same code twice, all rows in A with the same resample code should appear twice.

---EDIT---

The resampling is done with replacement. So far what I did is:

res <- resample(unique(A[,1]), size = length(unique(A[,1])) , replace = TRUE, prob= NULL) 
A.new <- A[which(A[,1] %in% res),]

however, assume that two lines in A have the same code (say 2), and that the vector res selects 2 4 times. In A.new I will only have 2 twice (because there are two lines coded as 2 in A[,1]), instead that having these two lines repeated 4 times

dww · Accepted Answer

We can do it like this:

A.new = sapply(res, function(x) A[A[,1] == x, ])
A.new = do.call(rbind, A.new)

The first line makes a list of matrices in which each value of res creates a list item that is the subset of A for which the 1st column equals that value of res. If res contains the same number more than once, a matrix will be created for each occurrence of that value.

The 2nd line uses rbind to condense this list into a single matrix

Resampling with entries with same code (ID)

Answers (1)

Related Questions