Reputation: 103
I want to create a linear mixed-effects model which indexes by both ID
and year
. I have never worked with longitudinal data before and I'm not sure how to approach modelling the data I have.
The data set looks like this:
data <- data.frame(ID = c(1,1,1,1,2,2,3,3,3),
year = c(1,2,3,4,1,2,1,2,3),
score1 = c(90,78,92,69,86,73,82,85,91),
score2 = c(89,89,72,98,95,92,94,87,89),
score3 = c(91,88,93,89,90,78,87,98,89))
ID, year, score1, score2, score3
1, 1, 90, 89, 91
1, 2, 78, 89, 88
1, 3, 92, 72, 93
1, 4, 69, 98, 89
2, 1, 86, 95, 90
2, 2, 73, 92, 78
3, 1, 82, 94, 87
3, 2, 85, 87, 98
3, 3, 91, 89, 89
The data is unbalanced (different number of years for each ID
). The year
variable indicates the the n-th year that the specific ID
was observed.
I want to create a linear effects model on score1
as a function of score2 + score3
and indexing by the variables ID
and year
.
This is how I approached the code using the package lme4
:
lmer(score1 ~ (score2 + score3 | ID/year), data=data)
Is this possible to do in R?
Upvotes: 0
Views: 368
Reputation: 226182
Your current version won't work, but you have a few choices. I believe the maximal model is
score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 + factor(year) | ID)
which evaluates
score2
, score3
, and the sampling sequence (year
)score1
when score2 == score3 == 0
in year 1); the effects of score2
and score3
on score1
; and the effect of year of sampling.This formulation is actually slightly overspecified (and lmer
will give a warning, which you would have to override using the control
argument) because there is only one observation per ID/year combination, and we have specified a separate random effect for each one — so the year|ID
effects and the residual variance are jointly unidentifiable.
A more typical model would be
score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 | ID) + (1|ID:year)
This version assumes that the variation among years is compound symmetric, i.e. that the variability in each year is the same and that every pair of years within ID values has the same correlation. However, it is still overspecified — for the same reasons described above (i.e. there is only a single observation per ID/year combination), the (1|ID:year)
term is confounded with the residual variation.
Therefore, I would recommend
score1 ~ 1 + score2 + score3 + factor(year) + (1 + score2 + score3 | ID)
Upvotes: 1