David
David

Reputation: 111

R: mixed models - how to predict a variable using previous values of this same variable

I struggle with multilevel models and prepared a reproducible example to be clear.

Let's say I would like to predict the height of children after 12 months of follow_up, i.e. their height at month == 12, using the previous values obtained for the height, but also their previous values of weight, with such a dataframe.

df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3),
                  month = c (1, 3, 6, 12, 1, 6, 12, 1, 6, 8, 12),
                  weight = c (14, 15, 17, 18, 21, 21, 22, 8, 8, 9, 10),
                  height = c (100, 102, 103, 104, 122, 123, 125, 82, 86, 88, 90))
        
   ID month weight height
1   1     1     14    100
2   1     3     15    102
3   1     6     17    103
4   1    12     18    104
5   2     1     21    122
6   2     6     21    123
7   2    12     22    125
8   3     1      8     82
9   3     6      8     86
10  3     8      9     88
11  3    12     10     90

My plan was to use the following model (obviously I have much more data than 3 patients, and more lines per patient). Because my height are correlated within each patient, I wanted to add a random intercept (1|ID), but also a random slope and it is the reason why I added (month|ID) (I saw in several examples of predicting scores of students that the "occasion" or "day test" was added as a random slope). So I used the following code.

library(tidymodels)
library(multilevelmod)
library(lme4)

#Specifications
mixed_model_spec <- linear_reg() %>% 
  set_engine("lmer") %>% 
  set_args(na.action=na.exclude, control = lmerControl(optimizer ="bobyqa"))

#Fitting the model
mixed_model_fit <- 
  mixed_model_spec %>% 
  fit(height ~ weight + month + (month|ID),
      data = df)

My first problem is that if I add "weight" (and its multiple values per ID) as a variable, I have the following error "boundary (singular) fit: see help('isSingular')" (even in my large dataset), while if I keep only variables with one value per patient (e.g. sex) I do not have this problem. Can anyone explain me why ?

My second problem is that by training a similar model, I can predict for new children the values of height at nearly all months (I get a predicted value at month 1, month X, ..., month 12) that I can compare to the real values collected on my test set. However, what I am interesting in is to predict the value at month 12 and integrate the previous values from each patients in this testing test. In other words, I do not want the model to predict the whole set of values from scratch (more precisely, from the patient data used for training), but also from the previous values of the new patient at month 1, month 4, month 6 etc. already available. How I can write my code to obtain such a prediction?

Thanks a lot for your help!

Upvotes: 2

Views: 215

Answers (1)

Robert Long
Robert Long

Reputation: 6812

My first problem is that if I add "weight" (and its multiple values per ID) as a variable, I have the following error "boundary (singular) fit: see help('isSingular')" (even in my large dataset), while if I keep only variables with one value per patient (e.g. sex) I do not have this problem. Can anyone explain me why ?

This happens when the random effects structure is too complex to be supported by the data. Other than this it is usually not possible to identify exactly why this happens in some situations and not others. Basically the model is overfitted. A few things you can try are:

  • centering the month variable
  • centering other numeric variables
  • fitting the model without the correlation between random slopes and intercepts, by using || instead of |

There are also some related questions and answers here:

https://stats.stackexchange.com/questions/378939/dealing-with-singular-fit-in-mixed-models/379068#379068

https://stats.stackexchange.com/questions/509892/why-is-this-linear-mixed-model-singular/509971#509971

As for the 2nd question, it sounds like you want some kind of time series model. An autoregressive model such as AR(1) might be sufficient, but this is not supported by lme4. You could try nmle instead.

Upvotes: 2

Related Questions