codemachino
codemachino

Reputation: 103

Mixed effects model on panel data with 2 indexing variables using lme4?

I want to create a linear mixed-effects model which indexes by both ID and year. I have never worked with longitudinal data before and I'm not sure how to approach modelling the data I have.

The data set looks like this:

data <- data.frame(ID = c(1,1,1,1,2,2,3,3,3),
                   year = c(1,2,3,4,1,2,1,2,3),
                   score1 = c(90,78,92,69,86,73,82,85,91),
                   score2 = c(89,89,72,98,95,92,94,87,89),
                   score3 = c(91,88,93,89,90,78,87,98,89))

ID, year, score1, score2, score3
1, 1, 90, 89, 91
1, 2, 78, 89, 88
1, 3, 92, 72, 93
1, 4, 69, 98, 89
2, 1, 86, 95, 90
2, 2, 73, 92, 78
3, 1, 82, 94, 87
3, 2, 85, 87, 98
3, 3, 91, 89, 89

The data is unbalanced (different number of years for each ID). The year variable indicates the the n-th year that the specific ID was observed.

I want to create a linear effects model on score1 as a function of score2 + score3 and indexing by the variables ID and year.

This is how I approached the code using the package lme4:

lmer(score1 ~ (score2 + score3 | ID/year), data=data)

Is this possible to do in R?

Upvotes: 0

Views: 368

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226182

Your current version won't work, but you have a few choices. I believe the maximal model is

score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 + factor(year) | ID)

which evaluates

  • the population-level effect of score2, score3, and the sampling sequence (year)
  • the variation among individuals in the intercept (expected value of score1 when score2 == score3 == 0 in year 1); the effects of score2 and score3 on score1; and the effect of year of sampling.

This formulation is actually slightly overspecified (and lmer will give a warning, which you would have to override using the control argument) because there is only one observation per ID/year combination, and we have specified a separate random effect for each one — so the year|ID effects and the residual variance are jointly unidentifiable.

A more typical model would be

score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 | ID) + (1|ID:year)

This version assumes that the variation among years is compound symmetric, i.e. that the variability in each year is the same and that every pair of years within ID values has the same correlation. However, it is still overspecified — for the same reasons described above (i.e. there is only a single observation per ID/year combination), the (1|ID:year) term is confounded with the residual variation.

Therefore, I would recommend

score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 | ID)

Upvotes: 1

Related Questions