horseoftheyear
horseoftheyear

Reputation: 915

R - Unit specific time trends in regression

In a regression I am trying to model unit specific time trends but I keep running into difficulties. In R when I estimate the model with unit and year fixed effects like lm(y~x+factor(unit)+factor(time)) I get perfectly normal results. However when I try to do lm(y~x+factor(unit)*factor(year))I run into trouble as NA's are produced.

Using some mock data to illustrate:

# Unit of analysis are countries
country<-c(rep("Isthmus",10),rep("Nambutu",10),rep("San Monique",10))
ccode<-c(rep(1,10),rep(2,10),rep(3,10))
year <- c(rep(2000:2009,3)) # Time
x1<-rnorm(30)*ccode 
x2<-runif(30)
y<-0.5*x1-0.3*x2+rnorm(30) # Outcome variable
df=data.frame(country,ccode,year,y,x1,x2)

Estimating the model using fixed effects for units and time, country and year respectively:

m0<-lm(y~x1+x2+factor(ccode)+factor(year),df);summary(m0)

# Part of the regression output:

Coefficients:
                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)      -0.92780    0.68231  -1.360    0.1928    
x1                0.59290    0.10058   5.895 0.0000226 ***
x2               -0.36457    0.96036  -0.380    0.7092    
factor(ccode)2    0.95383    0.48675   1.960    0.0677 .  
factor(ccode)3    0.46050    0.46475   0.991    0.3365    
factor(year)2001  0.15222    0.87295   0.174    0.8638 

No problems here. Now I estimate the model using the unit-specific time trends:

m1<-lm(y~x1+x2+factor(year)*factor(ccode),df);summary(m1)

# Part of the regression output:

                                  Estimate Std. Error t value Pr(>|t|)
(Intercept)                       1.3408         NA      NA       NA
x1                                3.3104         NA      NA       NA
x2                                0.5239         NA      NA       NA
factor(ccode)2                   -2.0544         NA      NA       NA
factor(ccode)3                  -12.2971         NA      NA       NA
factor(year)2001:factor(ccode)1  -3.4409         NA      NA       NA
factor(year)2002:factor(ccode)1  -0.6348         NA      NA       NA

In this particular case the NA's seem to be the result of too many variables in the model as there are no degrees of freedom left. This same issue occurs when using a larger dataset. I am not entirely sure what is going wrong here. I assume it has something to do with the way I use factor to model the unit specific time trends but so far I have not been able to solve it.

Does anyone has an idea on how to do this properly? Any suggestions are welcome.

Upvotes: 2

Views: 4766

Answers (1)

csgillespie
csgillespie

Reputation: 60492

You are trying to estimate more parameters than data, i.e. n < p. In your example data set, you have

R> nrow(df)
[1] 30

data points and are trying to estimate 30 parameters. As Ben points out, you are estimated a different parameter for each year. If you want to assume a linear trend, then just have

lm(y ~ x1 + factor(ccode)*time, data=df)

or to include a quadratic trend

lm(y ~ x1 + factor(ccode)*I(time^2), data=df)

Upvotes: 5

Related Questions