Reputation: 81
I am trying to write a function to return regression coefficient and standard errors since I need run a large number of regressions. The data could look like this
library(tidyverse)
library(fixest)
library(broom)
data<-tibble(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
Card = c(1,2,3,4,1,2,3,4),
A = rnorm(8),
B = rnorm(8),
C = rnorm(8)
)
My current code is as following
estimation_fun <- function(col1,col2,df) {
regression<-feols(df[[col1]] ~ df[[col2]] | Card + Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",example)
However, it does not work. I guess it is related to column name in feols because I can make it work for lm()
.
Upvotes: 2
Views: 1322
Reputation: 1372
Ronak's right: only formulas made of variable names can be used.
Since fixest
0.10.0, you can use the dot square bracket operator to do just that. See the help page for formula manipulation in xpd.
Just change one line in your code to make it work:
estimation_fun <- function(lhs, rhs, df) {
# lhs must be of length 1 (otherwise => not what you'd want)
# rhs can be a vector of variables
regression <- feols(.[lhs] ~ .[rhs] | Card + Date, df)
# etc...
}
# Example of how ".[]" works:
lhs = "A"
rhs = c("B", "C")
feols(.[lhs] ~ .[rhs], data)
#> OLS estimation, Dep. Var.: A
#> Observations: 8
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.375548 0.428293 0.876849 0.42069
#> B -0.670476 0.394592 -1.699164 0.15004
#> C 0.177647 0.537452 0.330536 0.75440
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.737925 Adj. R2: 0.183702
By the way, I recommend to use the built-in multiple estimation facility (see help here) since estimation speed will be substantially improved.
All combinations can be estimated with one line of code:
# All combinations at once
est_all = feols(c(A, B, C) ~ sw(A, B, C) | Card + Date, data)
Extraction of coefs/SEs can be done with another line:
# Coef + SE // see doc for summary.fixest_multi
coef_se_all = summary(est_all, type = "se_long")
coef_se_all
#> lhs rhs type A B C
#> 1 A A coef 1.0000000 NA NA
#> 2 A A se NaN NA NA
#> 3 A B coef NA 0.8204932 NA
#> 4 A B se NA 1.1102853 NA
#> 5 A C coef NA NA -0.7889534
#> 6 A C se NA NA 0.3260451
#> 7 B A coef 0.2456443 NA NA
#> 8 B A se 0.2314143 NA NA
#> 9 B B coef NA 1.0000000 NA
#> 10 B B se NA NaN NA
#> 11 B C coef NA NA -0.1977089
#> 12 B C se NA NA 0.3335988
#> 13 C A coef -0.4696954 NA NA
#> 14 C A se 0.3858851 NA NA
#> 15 C B coef NA -0.3931512 NA
#> 16 C B se NA 0.8584968 NA
#> 17 C C coef NA NA 1.0000000
#> 18 C C se NA NA NaN
NOTA: it requires fixest
0.10.1 or higher.
Upvotes: 1
Reputation: 388907
feols
function needs a formula object. You can create it using paste0
/sprintf
.
estimation_fun <- function(col1,col2,df) {
regression<-feols(as.formula(sprintf('%s ~ %s | Card + Date', col1, col2)), df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",data)
#[[1]]
#[1] -0.1173276
#attr(,"type")
#[1] "Clustered (Card)"
#[[2]]
#[1] 1.083011
#attr(,"type")
#[1] "Clustered (Card)"
To apply this for every pair of variables you may do -
cols <- names(data)[-(1:2)]
do.call(rbind, combn(cols, 2, function(x) {
data.frame(cols = paste0(x, collapse = '-'),
t(estimation_fun(x[1],x[2],data)))
}, simplify = FALSE))
cols X1 X2
#1 A-B -0.1173276 1.083011
#2 A-C -0.1117691 0.5648162
#3 B-C -0.3771884 0.1656587
Upvotes: 2