Reputation: 568
Assume I have the following data structure:
sl_sev_disbn <- data.frame("lower_band" = c(0,10e6,20e6,30e6,0,0,0),
"upper_band" = c(10e6,20e6,30e6,40e6,0,0,0),
"prob" = c(0.56521739,0.34782609,0.08212560,0.00483092,0,0,0),
"band" = c(1,2,3,4,5,6,7))
I want to randomly sample a "band" from this dataframe using the given probabilities as weights. From this band, I want to sample a randomly number uniformally from the lower and upper bands given and store every sample and return it into a vector. I have the following code:
rsuper_lrg_disbn <- function(n = 1,df){
band_sample <- sample(x = df$band, size = n, prob = df$prob,replace=TRUE)
vals <- c()
for (band in band_sample){
filt_df <- df[df$band == band,] #filter to randomly selected band
loss <- runif(1,min=filt_df$lower_band,max=filt_df$upper_band)
vals <- c(vals,loss)
}
return(vals)
}
Then using it would be like: rsuper_lrg_disbn(n=2,sl_sev_disbn)
However, this code slows down massively if I use n very large such as n = 1e6.
Does anyone know how I can speed this up?
Upvotes: 3
Views: 70
Reputation: 1231
Take advantage of the fact that runif
is vectorized!
sl_sev_disbn <- data.frame("lower_band" = c(0,10e6,20e6,30e6,0,0,0),
"upper_band" = c(10e6,20e6,30e6,40e6,0,0,0),
"prob" = c(0.56521739,0.34782609,0.08212560,0.00483092,0,0,0),
"band" = c(1,2,3,4,5,6,7))
rsuper_lrg_disbn <- function(n = 1,df){
band_sample <- sample(x = df$band, size = n, prob = df$prob,replace=TRUE)
vals <- c()
for (band in band_sample){
filt_df <- df[df$band == band,] #filter to randomly selected band
loss <- runif(1,min=filt_df$lower_band,max=filt_df$upper_band)
vals <- c(vals,loss)
}
return(vals)
}
fast_samp <- function(n = 1, df) {
band_sample <- sample(x = df$band, size = n, prob = df$prob,replace=TRUE)
vals <- runif(n, min = df[band_sample, 'lower_band'], max = df[band_sample, 'upper_band'])
return(vals)
}
## same dist
summary(rsuper_lrg_disbn(n = 3e4, sl_sev_disbn))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 832 4428211 8903344 10290102 15373486 39992683
summary(fast_samp(n = 3e4, sl_sev_disbn))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2278 4435472 8827781 10261747 15312544 39703908
library(microbenchmark)
microbenchmark(rsuper_lrg_disbn(n = 1e3, sl_sev_disbn),
fast_samp(n = 1e3, sl_sev_disbn))
#> Unit: microseconds
#> expr min lq mean
#> rsuper_lrg_disbn(n = 1000, sl_sev_disbn) 36032.381 37381.912 38232.6291
#> fast_samp(n = 1000, sl_sev_disbn) 75.062 79.012 115.8676
#> median uq max neval
#> 37672.677 38327.886 60730.445 100
#> 89.284 92.444 2689.974 100
Created on 2019-10-16 by the reprex package (v0.3.0)
Upvotes: 3