Reputation: 3
I have a question on analyzing longitudinal data in R.
To provide a little context, my dataset is in long format, organized for each subject ('ID') across 3 time-points. Importantly, my data is unbalanced (with some complete observations across 3 waves, whilst others may only have data for 1 or 2 waves). A simplistic example is presented below:
ID timepoint outcome predictor
001 1 244 100
001 2 305 144
002 1 122 200
002 2 266 120
002 3 308 118
003 2 311 129
003 3 411 126
I'm planning to run a generalized additive model (via the mgcv
package) to examine whether change in my continuous DV ('outcome') across the 3 waves can be significantly predicted using scores only from timepoint 1.
So essentially my desired model will be look like the following:
model1 = gam(score ~ s(predictor_scores_at_timepoint1, k=4), data=df, method='ML')
Is there an intuitive way to go about this?
Many Thanks! :)
Upvotes: 0
Views: 143
Reputation: 2414
I think the easiest is to divide your data into train and test.
train <- df[timepoint == 1, ]
test <- df[timepoint > 1, ]
Then run your gam:
model1 <- gam(outcome ~ s(predictor, k = 4), data = train, method = 'ML')
Then predict:
test$predictions <- predict(model1, newdata = test, type = 'response')
Upvotes: 0