Reputation: 409
I have children's test scores and demographics for different years (longitudinal data) and need to run a couple comparison models on it. I'm confused on how to set up the level 1 and level 2 variables in R
.
My dataframe (df):
Student Year Gender Race MathScore DepressionScore MemoryScore
1 1999 M C 80 15 80
1 2000 M C 81 25 60
1 2001 M C 70 50 75
2 1999 F C 65 15 99
2 2000 F C 70 31 98
2 2001 F C 71 30 99
3 1999 F AA 92 10 90
3 2000 F AA 89 10 91
3 2001 F AA 85 26 80
I want to run at least two models and compare, but I'm not sure how to separate the time-varying covariates from the time varying ones. I've tried these:
summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= Year|Student, data=df, na.action="na.omit")
summary(fix2 <- lme(MathScore ~ 1+Gender+Race+DepressionScore+MemoryScore, random=~1|Year, data=df, na.action=na.omit))
My questions are:
1. in "fix" are all covariates supposed to follow the first tilda and should the random~ be Year|Student
?
how can I specify depressionscore and memoryscore are varying by year and student too?
is fix2 supposed to have "random=~1+Student|Year" or just "random=~1|Year
" ?
Upvotes: 0
Views: 521
Reputation: 879
You have years nested in students, so the command for the random intercept model should be:
summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= ~1|Student/Years, data=df, na.action="na.omit")
To separate the time varying and the time in-varying effects, you need to separate the estimates in the time varying factors (see: Fairbrother, M., 2014. Two Multilevel Modeling Techniques for Analyzing Comparative Longitudinal Survey Datasets. Political Science Research and Methods 2, 119–140. doi:10.1017/psrm.2013.24)
What this requires is to center the time-varying variables by students to estimate the effect of persistent 'student' effects and subtract them from the original variable to separate the time varying parts. Without the dataset I am not sure whether this works, but try something like
ddply(dat, "Student", transform, mean.std.DepressionScore = mean(DepressionScore))
ddply(dat, "Student", transform, mean.std.MemoryScore= mean(MemoryScore))
df$time.DepressionScore <- df$DepressionScore-df$mean.std.DepressionScore
df$time.MemoryScore<- df$MemoryScore-df$mean.std.MemoryScore
then the model becomes:
summary(fix <-lme(MathScore ~ Gender+Race+mean.std.DepressionScore+time.DepressionScore+mean.std.MemoryScore+time.MemoryScore + Year, random= ~1|Year/Student, data=df, na.action="na.omit")
In this model the mean.std values provide the estimates for time persistent differences 'between' students, while the time. estimates are a measure of over-time changes 'within' students. You need the fixed effects estimates for years, to control for trends which potentially equally affect time persistent and time varying effects.
Upvotes: 4