Ragy Isaac
Ragy Isaac

Reputation: 1458

Bootstrap a large data set

I would like to bootstrap a large data set which contains multiple column and row variables. The following is a simplified re-creation of my data set:

charDataDiff <- data.frame(c('A','B','C'), matrix(1:72, nrow=9))
colnames(charDataDiff) <- c("patchId","s380","s390","s400","s410","s420","s430","s440","s450")

Separate the data using the patchId as the criteria. This creates three lists: one for each Variable

idColor <-  c("A", "B", "C")
(patchSpectrum <- lapply(idColor, function(idColor) charDataDiff[charDataDiff$patchId==idColor,]))

Created the function sampleBoot to sample the patchSpectrum

sampleBoot <-  function(nbootstrap=2, patch=3){
    return(lapply(1:nbootstrap, function(i)
             {patchSpectrum[[patch]][sample(1:nrow(patchSpectrum[[patch]]),replace=TRUE),]}))}

Example:

sampleBoot(5,3)

Here is where I am stuck:

  1. I need to sample each patchId list along with each column variable (which the above "sampleBoot" easily accomplish),
  2. Take the median of each patchId sampling list iteration, and
  3. Create a new population of the medians to calculate parametric parameters. I can do it manually but that would be silly.

Upvotes: 0

Views: 1095

Answers (1)

Ali
Ali

Reputation: 9830

As much as I understand from your question, you may do as follows:

do.call(rbind, lapply(sampleBoot(5, 3), function(x) apply(x[-1], 2, median)))

It crates a table of the medians of 5 samplings of patch 3.

Upvotes: 1

Related Questions