Reputation: 915
In a regression I am trying to model unit specific time trends but I keep running into difficulties.
In R
when I estimate the model with unit and year fixed effects like lm(y~x+factor(unit)+factor(time))
I get perfectly normal results. However when I try to do lm(y~x+factor(unit)*factor(year))
I run into trouble as NA's
are produced.
Using some mock data to illustrate:
# Unit of analysis are countries
country<-c(rep("Isthmus",10),rep("Nambutu",10),rep("San Monique",10))
ccode<-c(rep(1,10),rep(2,10),rep(3,10))
year <- c(rep(2000:2009,3)) # Time
x1<-rnorm(30)*ccode
x2<-runif(30)
y<-0.5*x1-0.3*x2+rnorm(30) # Outcome variable
df=data.frame(country,ccode,year,y,x1,x2)
Estimating the model using fixed effects for units and time, country and year respectively:
m0<-lm(y~x1+x2+factor(ccode)+factor(year),df);summary(m0)
# Part of the regression output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.92780 0.68231 -1.360 0.1928
x1 0.59290 0.10058 5.895 0.0000226 ***
x2 -0.36457 0.96036 -0.380 0.7092
factor(ccode)2 0.95383 0.48675 1.960 0.0677 .
factor(ccode)3 0.46050 0.46475 0.991 0.3365
factor(year)2001 0.15222 0.87295 0.174 0.8638
No problems here. Now I estimate the model using the unit-specific time trends:
m1<-lm(y~x1+x2+factor(year)*factor(ccode),df);summary(m1)
# Part of the regression output:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3408 NA NA NA
x1 3.3104 NA NA NA
x2 0.5239 NA NA NA
factor(ccode)2 -2.0544 NA NA NA
factor(ccode)3 -12.2971 NA NA NA
factor(year)2001:factor(ccode)1 -3.4409 NA NA NA
factor(year)2002:factor(ccode)1 -0.6348 NA NA NA
In this particular case the NA's
seem to be the result of too many variables in the model as there are no degrees of freedom left. This same issue occurs when using a larger dataset. I am not entirely sure what is going wrong here. I assume it has something to do with the way I use factor
to model the unit specific time trends but so far I have not been able to solve it.
Does anyone has an idea on how to do this properly? Any suggestions are welcome.
Upvotes: 2
Views: 4766
Reputation: 60492
You are trying to estimate more parameters than data, i.e. n < p
. In your example data set, you have
R> nrow(df)
[1] 30
data points and are trying to estimate 30 parameters. As Ben points out, you are estimated a different parameter for each year. If you want to assume a linear trend, then just have
lm(y ~ x1 + factor(ccode)*time, data=df)
or to include a quadratic trend
lm(y ~ x1 + factor(ccode)*I(time^2), data=df)
Upvotes: 5