tsgsteele
tsgsteele

Reputation: 97

For loop within a function using function arguments

I am running regressions on a data set with > 4 million observations. I am trying to do a sensitivity analysis to see how sensitive my regression coefficients are to different sample sizes. I want to:

I want to replicate the above steps 400 times to create a distribution of coefficients and plot the mean and confidence intervals. I have created two functions to do this. The first, coef_sampler, runs a regression on a sample of the data and returns the coefficient:

coef_sampler <- function(data,n){
      model <- lm(y ~ x, data = data[sample(nrow(data), n),])
      return(model[["coefficients"]][[1]])
  }

The second, my_function, replicates this process 400 times for different sample sizes, generating vectors of coefficients for different sample sizes. It then calculates the mean of each vector along with it's upper and lower bounds using the CI() function from Rmisc:

my_function <- function(data, n, nrep, to, by){

  for (j in seq.int(n, to, by)){
    plist <- data.frame(pval = replicate(nrep, coef_sampler(data,j)))

    if(!exists("ci_mat")) {
      ci_mat <- data.frame(CI(plist[,1]))
    } else {
      ci_mat <- cbind(ci_mat, data.frame(CI(plist[,1])))
    }
  }
}

When I just run the for loop with the values I want for number of repetitions, and a sequence of sample sizes, it works fine. When I call my_function() with the above values, it simply doesn't work. No data frame of means and confidence intervals is generated. Note that I call the coef_sampler() function within my_function, could this be causing the problem? Or is it using a for loop that uses the function arguments causing the problem.

Hopefully my question makes sense, any and all help would be greatly appreciated!

Upvotes: 0

Views: 56

Answers (2)

mnist
mnist

Reputation: 6956

Just include a return at the end of your function

library(Rmisc)

my_data <- tibble(x = 1:5,
                  y = rnorm(5))

coef_sampler <- function(data,n){
  model <- lm(y ~ x, data = data[sample(nrow(data),n),])
  return(model[["coefficients"]][[1]])
}

my_function <- function(data,n,nrep,to,by){
  for (j in seq.int(n,to,by)){
    plist <- data.frame(pval = replicate(nrep,coef_sampler(data,j)))

    if(!exists("ci_mat")){
      ci_mat <- data.frame(CI(plist[,1]))
    } else {
      ci_mat <- cbind(ci_mat, data.frame(CI(plist[,1])))
    }
  }
  return(ci_mat)
}

my_function(my_data, 1, 2, 5, 1)

Upvotes: 2

slava-kohut
slava-kohut

Reputation: 4233

The problem is that your function does not return the populated ci_mat. This variables only lives in the function environment and gets destroyed after the function run:

df <- data.frame(x = 1:100, y = 4*(1:100) + 5 + rnorm(100,5,10))

coef_sampler <- function(data,n){
  model <- lm(y ~ x, data = data[sample(nrow(data),n),])
  return(model[["coefficients"]][[1]])
}

my_function <- function(data,n,nrep,to,by){

  for (j in seq.int(n,to,by)){
    plist <- data.frame(pval = replicate(nrep,coef_sampler(data,j)))

    if(!exists("ci_mat")){
      ci_mat <- data.frame(Rmisc::CI(plist[,1]))
    }else{
      ci_mat <- cbind(ci_mat, data.frame(Rmisc::CI(plist[,1])))
    }
  }
  ci_mat
}    

# sample call
my_function(df, 5, 10, 100,1)

Make your function return ci_mat.

Upvotes: 1

Related Questions