Xin
Xin

Reputation: 674

Repeating a function and recording the results for each run

I have a function and I want to run it multiple times with each time the variable 'draw' of 19 increasing by one all the way up to 52. And after each run I want to record the results by using summary() on 'sim' and put it into a df. I was was wondering how could I use a loop in this scenario so I do not have to go in and change the draw value each time and record my results msnaully? Desired results:

draw  Min  1st Qu. Median Mean    3rd Qu. Max.
19    16    27      30    29.85    33     45
20    22    30      33    33.13    37     50u
21
.
.
52

Code:


library(dplyr)

N      <- 2500
d      <- data.frame(id = 1:N)
draw   <- 19     ## changing variable 
n      <- 22       
n_runs <- 500


sim <- c()

set.seed(123)
for (j in 1:n_runs) {
  all <- c()
  for (i in 1:draw) {
    srs <- sample_n(d, n, replace = FALSE)
    all <- bind_rows(all, srs)
  }
  repeats <- all %>%
    group_by(id) %>%
    mutate(freq = n()) %>%
    filter(freq > 1) %>%
    n_distinct(id) %>%
    as.data.frame()
  sim <- bind_rows(sim, repeats)
}

summary(sim)

Upvotes: 1

Views: 173

Answers (1)

StupidWolf
StupidWolf

Reputation: 46888

Yeah, you have something working and need to write it into a function.

This part of your code is simply looking for how many unique id appear more than once:

repeats <- all %>%
    group_by(id) %>%
    mutate(freq = n()) %>%
    filter(freq > 1) %>%
    n_distinct(id) %>%
    as.data.frame()

And you can simplify it to this:

sum(table(all$id)>1)

Without changing too much of what you have, your function will look like this, I replaced "all" with ALL because "all" is a function in R:

func = function(draw,d,n,n_runs){
       sim <- c()
       for (j in 1:n_runs) {
              ALL <- c()
              for (i in 1:draw) {
              srs <- sample_n(d, n, replace = FALSE)
              ALL <- bind_rows(ALL, srs)
              }
       repeats <- sum(table(ALL$id)>1)
       sim <- c(sim, repeats)
       }
       summary(sim)
}

To test, you do:

set.seed(123)
func(19,data.frame(id=1:2500),22,500)

Should give you exactly the same result as above. Now you apply this function using map, changing only draw:

library(purrr)
library(dplyr)
set.seed(123)
res = 19:22 %>% map(func,data.frame(id=1:2500),22,500)
cbind(19:22,do.call(rbind,res))

I did not run all of 19:52 because it's too slow.. You can try to optimize the code without doing so many bind_rows :) Hope this is what you need

Upvotes: 1

Related Questions