Nick
Nick

Reputation: 81

R: how to call feols regression within a function

I am trying to write a function to return regression coefficient and standard errors since I need run a large number of regressions. The data could look like this

library(tidyverse)
library(fixest)
library(broom)
data<-tibble(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
         Card = c(1,2,3,4,1,2,3,4),
         A = rnorm(8),
         B = rnorm(8),
         C = rnorm(8)
         )

My current code is as following

estimation_fun <- function(col1,col2,df) {

  regression<-feols(df[[col1]] ~ df[[col2]] | Card + Date, df)

  est =tidy(regression)$estimate
  se = tidy(regression)$std.error

  output <- list(est,se)
  return(output)

}
estimation_fun("A","B",example)

However, it does not work. I guess it is related to column name in feols because I can make it work for lm().

Upvotes: 2

Views: 1322

Answers (2)

Laurent Berg&#233;
Laurent Berg&#233;

Reputation: 1372

Ronak's right: only formulas made of variable names can be used.

Since fixest 0.10.0, you can use the dot square bracket operator to do just that. See the help page for formula manipulation in xpd.

Just change one line in your code to make it work:

estimation_fun <- function(lhs, rhs, df) {
  # lhs must be of length 1 (otherwise => not what you'd want)
  # rhs can be a vector of variables
  regression <- feols(.[lhs] ~ .[rhs] | Card + Date, df)
  
  # etc...
}
# Example of how ".[]" works: 

lhs = "A"
rhs = c("B", "C")
feols(.[lhs] ~ .[rhs], data)
#> OLS estimation, Dep. Var.: A
#> Observations: 8 
#> Standard-errors: IID 
#>              Estimate Std. Error   t value Pr(>|t|) 
#> (Intercept)  0.375548   0.428293  0.876849  0.42069 
#> B           -0.670476   0.394592 -1.699164  0.15004 
#> C            0.177647   0.537452  0.330536  0.75440 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.737925   Adj. R2: 0.183702

By the way, I recommend to use the built-in multiple estimation facility (see help here) since estimation speed will be substantially improved.

Update

All combinations can be estimated with one line of code:

# All combinations at once
est_all = feols(c(A, B, C) ~ sw(A, B, C) | Card + Date, data)

Extraction of coefs/SEs can be done with another line:

# Coef + SE // see doc for summary.fixest_multi
coef_se_all = summary(est_all, type = "se_long")
coef_se_all
#>    lhs rhs type          A          B          C
#> 1    A   A coef  1.0000000         NA         NA
#> 2    A   A   se        NaN         NA         NA
#> 3    A   B coef         NA  0.8204932         NA
#> 4    A   B   se         NA  1.1102853         NA
#> 5    A   C coef         NA         NA -0.7889534
#> 6    A   C   se         NA         NA  0.3260451
#> 7    B   A coef  0.2456443         NA         NA
#> 8    B   A   se  0.2314143         NA         NA
#> 9    B   B coef         NA  1.0000000         NA
#> 10   B   B   se         NA        NaN         NA
#> 11   B   C coef         NA         NA -0.1977089
#> 12   B   C   se         NA         NA  0.3335988
#> 13   C   A coef -0.4696954         NA         NA
#> 14   C   A   se  0.3858851         NA         NA
#> 15   C   B coef         NA -0.3931512         NA
#> 16   C   B   se         NA  0.8584968         NA
#> 17   C   C coef         NA         NA  1.0000000
#> 18   C   C   se         NA         NA        NaN

NOTA: it requires fixest 0.10.1 or higher.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

feols function needs a formula object. You can create it using paste0/sprintf.

estimation_fun <- function(col1,col2,df) {
  
  regression<-feols(as.formula(sprintf('%s ~ %s | Card + Date', col1, col2)), df)
  
  est =tidy(regression)$estimate
  se = tidy(regression)$std.error
  
  output <- list(est,se)
  return(output)
  
}

estimation_fun("A","B",data)

#[[1]]
#[1] -0.1173276
#attr(,"type")
#[1] "Clustered (Card)"

#[[2]]
#[1] 1.083011
#attr(,"type")
#[1] "Clustered (Card)"

To apply this for every pair of variables you may do -

cols <- names(data)[-(1:2)]

do.call(rbind, combn(cols, 2, function(x) {
  data.frame(cols = paste0(x, collapse = '-'), 
             t(estimation_fun(x[1],x[2],data)))
}, simplify = FALSE))

   cols         X1        X2
#1  A-B -0.1173276  1.083011
#2  A-C -0.1117691 0.5648162
#3  B-C -0.3771884 0.1656587

Upvotes: 2

Related Questions