user8660846
user8660846

Reputation: 21

How to calculate marginal effects of logit model with fixed effects by using a sample of more than 50 million observations

I have a sample of more than 50 million observations. I estimate the following model in R:

model1 <- feglm(rejection~  variable1+ variable1^2 +  variable2+ variable3+ variable4 | city_fixed_effects + year_fixed_effects, family=binomial(link="logit"),  data=database)

Based on the estimates from model1, I calculate the marginal effects:

mfx2 <- marginaleffects(model1)
summary(mfx2)

This line of code also calculates the marginal effects of each fixed effects which slows down R. I only need to calculate the average marginal effects of variables 1, 2, and 3. If I separately, calculate the marginal effects by using mfx2 <- marginaleffects(model1, variables = "variable1") then it does not show the standard error and the p-value of the average marginal effects.

Any solution for this issue?

Upvotes: 2

Views: 2467

Answers (1)

Vincent
Vincent

Reputation: 17805

Both the fixest and the marginaleffects packages have made recent changes to improve interoperability. The next official CRAN releases will be able to do this, but as of 2021-12-08 you can use the development versions. Install:

library(remotes)
install_github("lrberge/fixest")
install_github("vincentarelbundock/marginaleffects")

I recommend converting your fixed effects variables to factors before fitting your models:

library(fixest)
library(marginaleffects)

dat <- mtcars
dat$gear <- as.factor(dat$gear)

mod <- feglm(am ~ mpg + mpg^2 + hp + hp^3| gear,
             family = binomial(link = "logit"),
             data = dat)

Then, you can use marginaleffects and summary to compute average marginal effects:

mfx <- marginaleffects(mod, variables = "mpg")
summary(mfx)
## Average marginal effects 
##       type Term Effect Std. Error  z value Pr(>|z|)  2.5 % 97.5 %
## 1 response  mpg 0.3352         40 0.008381  0.99331 -78.06  78.73
## 
## Model type:  fixest 
## Prediction type:  response

Note that computing average marginal effects requires calculating a distinct marginal effect for every single row of your dataset. This can be computationally expensive when your data includes millions of observations.

Instead, you can compute marginal effects for specific values of the regressors using the newdata argument and the typical function. Please refer to the marginaleffects documentation for details on those:

marginaleffects(mod, 
                variables = "mpg", 
                newdata = typical(mpg = 22, gear = 4))
##   rowid     type term     dydx std.error       hp mpg gear predicted
## 1     1 response  mpg 1.068844   50.7849 146.6875  22    4 0.4167502

Upvotes: 4

Related Questions