B. Davis
B. Davis

Reputation: 3441

Function to bootstrap sample and generate standard error (SE)

I have a 100 probabilities that are associated with four unique individuals (AAA:DDD) that I have created and displayed here.

IndID <- as.factor(rep(c("AAA", "BBB", "CCC", "DDD"),25))
Prob <- runif(length(IndID),0,1)
Data <- data.frame(IndID, Prob)
Data <- Data[order(Data$IndID),]

> head(Data)
   IndID      Prob
1    AAA 0.5860417
5    AAA 0.1824266
9    AAA 0.3301014
13   AAA 0.5048122
17   AAA 0.3717195
21   AAA 0.9090825

> summary(Data)
 IndID         Prob        
 AAA:25   Min.   :0.01341  
 BBB:25   1st Qu.:0.19743  
 CCC:25   Median :0.48315  
 DDD:25   Mean   :0.50475  
          3rd Qu.:0.81789  
          Max.   :0.99805 

I want to bootstrap (sample with replacement) the probabilities the for each individual AAA:DDD. For each iteration I want to discretize the probabilities to 0’s and 1’s using a 0.50 cut off and then sum the vector.

I have created the function below which discretizes and sums.

BiSum <- function(x){
    IndBi <- ifelse(x >= 0.50, 1,0)
    SumIndBi <- sum(IndBi)
    }

I want to apply the function to the probabilities for each individual in a for() loop and use the boot() function as seen below.

require(boot)

SE <- numeric(length(unique(Data$IndID)))
for (i in unique(Data$IndID)){
    IndProbs <- Data$Prob[Data$IndID == i]
    b <- boot(IndProbs, BiSum, R=10)
    SE[i] <- sqrt(var(b$t))     #This is a roundabout way to grab the SE from a boot() object
        }

While I think the function BiSum is correct it is incorrectly incorporated into the for() loop and boot() function. The loop above results in the error:

Error in statistic(data, original, ...) : unused argument (original).

My goal: for each individual (AAA:DDD) I want to bootstrap Prob, discretize them using 0.50 as a cut off, and sum the resulting 0’s and 1’s. I want to do this R=10 (only low for example but will repeat R=10000 with real data) times for each AAA:DDD and then extract the standard error of the boot() object.

Suggestions on how to improve my loop above would be appreciated. Specifically how to correctly incorporate the ”statistic” argument to boot() within the loop.

Thanks in advance.

Upvotes: 1

Views: 563

Answers (1)

TWL
TWL

Reputation: 2300

You have to include index argument idx for the boot function

BiSum <- function(x, idx){
     IndBi <- ifelse(x[idx] >= 0.50, 1,0)
     SumIndBi <- sum(IndBi)
}

Also, to get SE, it may be cleaner to use sd(b$t) instead sqrt(var(b$t)

Upvotes: 1

Related Questions