Reputation: 208
I am interested in estimating a Poisson fixed effects model with:
where is the "age" of the observation.
I am interested in the coefficients, not the other fixed effects.
My first attempt at estimating this was as follows:
library(readr)
Data <- read_csv("FullData.csv", col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor()))
library(fixest)
Results = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, Data, nthreads=28, verbose=1000)
But this results in fepois
attempting to create a full matrix of dummies from the AGE
variable, which is too large to fit in memory. (There are around 150 million observations, and AGE
goes up to about 400.)
As an alternative, I tried:
Results = fepois(MOVE ~ 1 | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK + AGE, Data, nthreads=28, verbose=1000)
FE = fixef(Results)
With this approach, the fepois
call completes successfully, but then it fails in the fixef
call (to get the fixed effects, where the are now stored) with the message:
Problem getting FE, maximum iterations reached (1st order loop).NOTE: The fixed-effects are not regular, they cannot be straightforwardly interpreted. The number of references is only approximate.
Of course I could increase the number of iterations, but the fact I'm getting this message suggests there's probably a better approach I don't know about. ("Regularity" is also an issue with this approach. It doesn't matter if the estimation drops certain columns from the and fixed effects, but I do not want it to drop any columns from the fixed effects.)
How should I be approaching this estimation?
Incidentally: Despite setting nthreads
, fepois
still only uses one thread. Any ideas why? (Calling setFixest_nthreads(28)
also makes no difference it seems.)
Update 1: Setting iter=100000000
within the fixef
call makes no difference. I still get the same error, suggesting it's a different iteration count that's being hit.
Update 2: Here are the first 10000 lines of the data set: https://gist.github.com/tholden/7cf0b4b8ae2b6030b60b704766903612 (*)
Update 3: getFixest_nthreads()
returns 28, as expected (that's what I set it to, and it's also half the number of logical processors on my machine).
Upvotes: 0
Views: 180
Reputation: 1280
If I understand your problem correctly you are getting something like this
library(fixest)
library(readr)
examp_dat1 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor(), AGE = col_factor()))
mod = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat1)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
#> The variable 'AGE224' has been removed because of collinearity (see $collin.var).
mod
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315, STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE)
#> Estimate Std. Error z value Pr(>|z|)
#> AGE3 -0.012467 11.6001 -0.001075 0.99914
#> AGE4 0.049981 23.2149 0.002153 0.99828
#> AGE5 -0.105345 34.8334 -0.003024 0.99759
#> AGE6 -0.161140 46.4345 -0.003470 0.99723
#> AGE7 -0.234467 58.0617 -0.004038 0.99678
#> AGE8 -0.172549 69.6805 -0.002476 0.99802
#> AGE9 -0.130779 81.2899 -0.001609 0.99872
#> AGE10 -0.112788 92.8970 -0.001214 0.99903
#> ... 324 coefficients remaining (display them with summary() or use argument n)
#> ... 1 variable was removed because of collinearity (AGE224)
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,241.4 Adj. Pseudo R2: 0.249849
#> BIC: 33,928.0 Squared Cor.: 0.551105
What is happening is that you are treating age as a factor when you import the data so fepois is estimating coefficients for every level except the reference. If you are interested in the effect of age than all you need to do is either coerce it to a numeric or omit the Age = col_factor()
when you import
examp_dat2 = read_csv('https://gist.githubusercontent.com/tholden/7cf0b4b8ae2b6030b60b704766903612/raw/d3b7a3810936344906f90b7d62b506ff42af0dd1/SampleData.csv', col_types = cols(UPC_PRICE = col_factor(), WEEK = col_factor(), MOVE = col_integer(), STORE_COM_CODE = col_factor()))
mod2 = fepois(MOVE ~ AGE | STORE_COM_CODE^UPC_PRICE + STORE_COM_CODE^WEEK, data = examp_dat2)
#> NOTE: 9/0 fixed-effects (394 observations) removed because of only 0 outcomes.
mod2
#> Poisson estimation, Dep. Var.: MOVE
#> Observations: 9,605
#> Fixed-effects: STORE_COM_CODE^UPC_PRICE: 315, STORE_COM_CODE^WEEK: 384
#> Standard-errors: Clustered (STORE_COM_CODE^UPC_PRICE)
#> Estimate Std. Error z value Pr(>|z|)
#> AGE 1.3405 57551.2 2.3e-05 0.99998
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -12,567.5 Adj. Pseudo R2: 0.250126
#> BIC: 31,544.9 Squared Cor.: 0.504288
For the setFixest_nthreads()
For whatever reason if you want to throw all the available threads at the problem, then you need to set setFixest_nthreads(nthreads = 0)
.
Upvotes: 0