Reputation: 1
I am trying to find a way in R to randomly subset some data (proportion of suitable habitat in an area for an ecological study), calculate a mean and proportion of samples with values > 0 and then save or append those values to a dataframe. I then want to repeat this a number of times (1000 for the example). Standard bootstrapping or resampling packages won't work as I need to calculate freq of occurance as well as the mean of the subsample. I'm aware of the "apply" functions, but those loop over the entire data frame whereas I'm trying to do it on a subsample repeated. I know I need some code to get the calculated values in the loop to save and output but having issues. "habprop" is a column in a dataframe ("data") that I want to calculate the mean and proportion of positive values for and save.
for(i in 1000 {
randsample=data[sample(1:nrow(data), 50, replace=FALSE),]
m=mean(randsample$habprop)
randsamplepos=subset(randsample, habprop > 0)
habfreq=(nrow(randsamplepos)/nrow(randsample))
})
Upvotes: 0
Views: 1560
Reputation: 32426
Using boot
this should be possible
dat <- data.frame(habprop=rnorm(100))
## Function to return statistics from subsamples
stat <- function(dat, inds)
with(dat, c(mu=mean(habprop[inds]), freq=sum(habprop[inds] > 0)/length(inds)))
library(boot)
boot(data=dat, statistic=stat, R=1000)
# Bootstrap Statistics :
# original bias std. error
# t1* -0.06154533 -0.00324393 0.08377116
# t2* 0.52000000 -0.00073000 0.04853991
Upvotes: 0
Reputation: 702
How about the replicate
function? This post looks pretty similar.
Generating some data to work on
data <- data.frame(x1=rpois(5000, 5), x2=runif(5000), x3=rnorm(5000))
Defining a function to sample and take means and counts
sample_stats <- function(df, n=100){
df <- df[sample(1:nrow(df), n, replace=F),]
mx1 <- mean(df$x1[df$x1>0])
x1pos <- sum(df$x1>0)
return(c(mx1, x1pos))
}
run it once just to see output
sample_stats(data)
run it 1000 times
results <- replicate(1000, sample_stats(data, n=100))
Upvotes: 2