originalJim
originalJim

Reputation: 13

Predict new values using mixed model using lme() in R

I have the following data:

str(growth_data)
tibble [92 × 4] (S3: tbl_df/tbl/data.frame)
 $ person: num [1:92] 1 1 1 1 2 2 2 2 3 3 ...
 $ gender: chr [1:92] "F" "F" "F" "F" ...
 $ growth: num [1:92] 21 20 21.5 23 21 21.5 24 25.5 20.5 24 ...
 $ age   : Factor w/ 4 levels "8","10","12",..: 1 2 3 4 1 2 3 4 1 2 ...

And from this, using the lme() function in the nlme package, I have created the following model:

# Fitting a mixed model with a random coefficient and unstructured covariance structure.
unstructured_rand <- nlme::lme(growth ~ gender*age, 
                     random = ~ age | person, 
                     data=growth_data, 
                     correlation = corSymm())

I am trying to produce a set of predictions for new age values, not in my data, for persons in my data. Specifically, I want to produce a prediction for person 1 at age 13.

I have tried, in vein, to use the predict() function whilst specifying the newdata argument, like so:

newGrowth <- expand.grid(
  person = unique(growth_data$person),
  gender = c("F","M"),
  age = c(13,15,17,20)
)

newGrowth$Predicted_Response <- predict(unstructured_rand, newdata = newGrowth)

However, I keep running into the following error:

Error in `Names<-.pdMat`(`*tmp*`, value = value[[i]]) : 
  Length of names should be 4

This seems to be suggesting that my newdata does not have the correct number of columns, but from all other posts on this subject, I have never seen anyone specify a newdata dataframe with the correct number of columns. Further, the only column in my data that is not in the newdata dataframe is growth, which is the variable I am trying to predict.

What am I missing? There seems to be some obvious element from the documentation on lme.predict() that I am failing to apply to my data, but I cannot figure out what it is.

Any help would be much appreciated!

Upvotes: 1

Views: 301

Answers (1)

the-mad-statter
the-mad-statter

Reputation: 8886

One issue (or maybe the issue at hand) is that you fit a model on data where age was a factor and then tried to predict on data where age was continuous.

Because you did not supply your data, I can't be certain this is the same issue. But the Orthodont data is similar to yours, and this produces an error with the same wording.

Similar Error

library(nlme)

# make some data like yours
orthodont <- Orthodont
orthodont$age <- factor(orthodont$age)

# fit a model similar to yours
fm1 <- lme(distance ~ age, orthodont, random = ~ age | Subject)

# make some new data like your new data
newOrth <- data.frame(Sex = c("Male","Male","Female","Female","Male","Male"),
                      age = c(15, 20, 10, 12, 2, 4),
                      Subject = c("M01","M01","F30","F30","M04","M04"))

# attempt prediction and notice same error
predict(fm1, newOrth, level = 0:1)
#> Warning in model.frame.default(formula = asOneFormula(formula(reSt), fixed), :
#> variable 'age' is not a factor
#> Error in `Names<-.pdMat`(`*tmp*`, value = value[[i]]): Length of names should be 4

A Fix

Fit a model on data with a continuous age variable and use that for prediction. Especially because you are trying to extrapolate past ages for which the model had been fit.

# change factor to numeric to match new data
orthodont$age <- as.numeric(as.character(orthodont$age))

# refit
fm2 <- lme(distance ~ age, orthodont, random = ~ age | Subject)

# attempt prediction again
predict(fm2, newOrth, level = 0:1)
#>   Subject predict.fixed predict.Subject
#> 1     M01      26.66389        30.95074
#> 2     M01      29.96481        35.33009
#> 3     F30      23.36296              NA
#> 4     F30      24.68333              NA
#> 5     M04      18.08148        20.95016
#> 6     M04      19.40185        22.13877

Created on 2024-05-03 with reprex v2.1.0

Upvotes: 1

Related Questions