iPlexpen
iPlexpen

Reputation: 409

R command structure for Longitudinal data with level 1 and level 2 variables

I have children's test scores and demographics for different years (longitudinal data) and need to run a couple comparison models on it. I'm confused on how to set up the level 1 and level 2 variables in R.

My dataframe (df):

Student  Year  Gender Race MathScore DepressionScore MemoryScore
1        1999   M      C     80            15            80
1        2000   M      C     81            25            60
1        2001   M      C     70            50            75
2        1999   F      C     65            15            99
2        2000   F      C     70            31            98
2        2001   F      C     71            30            99
3        1999   F      AA    92            10            90
3        2000   F      AA    89            10            91
3        2001   F      AA    85            26            80

I want to run at least two models and compare, but I'm not sure how to separate the time-varying covariates from the time varying ones. I've tried these:

summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= Year|Student, data=df, na.action="na.omit")

summary(fix2 <- lme(MathScore ~ 1+Gender+Race+DepressionScore+MemoryScore, random=~1|Year, data=df, na.action=na.omit)) 

My questions are: 1. in "fix" are all covariates supposed to follow the first tilda and should the random~ be Year|Student?

  1. how can I specify depressionscore and memoryscore are varying by year and student too?

  2. is fix2 supposed to have "random=~1+Student|Year" or just "random=~1|Year" ?

Upvotes: 0

Views: 521

Answers (1)

Erdne Ht&#225;brob
Erdne Ht&#225;brob

Reputation: 879

You have years nested in students, so the command for the random intercept model should be:

summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= ~1|Student/Years, data=df, na.action="na.omit")

To separate the time varying and the time in-varying effects, you need to separate the estimates in the time varying factors (see: Fairbrother, M., 2014. Two Multilevel Modeling Techniques for Analyzing Comparative Longitudinal Survey Datasets. Political Science Research and Methods 2, 119–140. doi:10.1017/psrm.2013.24)

What this requires is to center the time-varying variables by students to estimate the effect of persistent 'student' effects and subtract them from the original variable to separate the time varying parts. Without the dataset I am not sure whether this works, but try something like

ddply(dat, "Student", transform, mean.std.DepressionScore  = mean(DepressionScore))
ddply(dat, "Student", transform, mean.std.MemoryScore= mean(MemoryScore))

df$time.DepressionScore <- df$DepressionScore-df$mean.std.DepressionScore
df$time.MemoryScore<- df$MemoryScore-df$mean.std.MemoryScore

then the model becomes:

summary(fix <-lme(MathScore ~ Gender+Race+mean.std.DepressionScore+time.DepressionScore+mean.std.MemoryScore+time.MemoryScore + Year, random= ~1|Year/Student, data=df, na.action="na.omit")

In this model the mean.std values provide the estimates for time persistent differences 'between' students, while the time. estimates are a measure of over-time changes 'within' students. You need the fixed effects estimates for years, to control for trends which potentially equally affect time persistent and time varying effects.

Upvotes: 4

Related Questions