R regression on multiple samples

Question

I am using R

I have a panel dataset of ~5000 observations of 250 individuals over time.

I need to build a difference in difference regression, therefore I draw a random observation for each individual and I run a regression:

lm(x ~ x1 + x2 + ... , data = ddply(df,.(individual),function(x) x[sample(nrow(x),1),]))

over the resulting sample.

I need to compute the regression n times on n different random samples and compute the average of each estimator.

Is there a way to do this efficiently without manually computing and averaging n regressions?

DAR79 · Accepted Answer

Solved:

I expected to find a specific package to do it but I built a function instead. For example, for n = 700

fun <- function(alfa){
  alfa <-ddply(df,.(individual),function(x) x[sample(nrow(x),1),])
  beta <- lm(x ~ x1 + x2 + ... , data = alfa )$coefficients
  return(beta)
}

df.full <- replicate(700,fun(alfa))

This way a dataset with 700 row is created, with the coefficient names as row. I can do even something like this:

fun <- function(alfa){
  alfa <-ddply(df,.(individual),function(x) x[sample(nrow(x),1),])
  beta <- lm(x ~ x1 + x2 + ... , data = alfa)
  gamma <- summary(beta)[["coefficients"]][,1]
  return(gamma)

}

df.full <- replicate(700,fun(alfa))

Changing [,1] with [,2] I will obtain the standard errors. After this, the means' computing follows directly.

R regression on multiple samples

Answers (1)

Related Questions