JamesLancaster
JamesLancaster

Reputation: 59

(MICE) missing data Imputation for dataset with Time factors (longitudinal data)

I have a query regarding the MICE function. I have a longitudinal dataset of 4500 participant's with missing values. Some of the variables are measured over time(0, 2 ,3, 5 etc) however there's missing values. Some of the variables are MAR and hence I am trying to impute the missing values taking into account the time-varying nature of the variables.

The data is in Long format(I have put the dput(head) below))

data is called "PaParty"

structure(list(id = c(8, 8, 8, 8, 11, 11), mostid = c("M0008", 
"M0008", "M0008", "M0008", "M0011", "M0011"), sex = c(1, 1, 1, 
1, 0, 0), age = c(69, 69, 69, 69, 64, 64), race = c(1, 1, 1, 
1, 1, 1), LeftEyeReplace = c(1, 1, 1, 1, 
    0, 0), Mnths_L_Replacement = c(9, 9, 9, 9, NA, NA), RightEyeReplace = c(1, 
    1, 1, 1, 1, 1), Mnths_R_Replacement = c(9, 9, 9, 9, 40, 40
    ), Time = c("0", "2", "3", "5", "0", "2"), bmi = c(26.79, 
    29.17, NA, NA, 26.88, 27.38), wototr = c(30, 27, NA, NA, 
    4, 30), wototl = c(33, 27, NA, NA, 2, 22), menr = c(1, NA, 
    NA, NA, 0, NA), menl = c(1, NA, NA, NA, NA, NA), KLGLeft = c(4, 
    NA, NA, NA, 3, 3), KLGRight = c(4, NA, NA, NA, 3, 4)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

The values I am trying to impute are for variables with continuous scores(0-100 is the range).

When I run my code:

Y <- c("wototr", "wototl", "bmi")
meth<-make.method(PaParty)
meth[1:length(meth)]<-""
meth[Y]<-"2l.pan"
pred<-make.predictorMatrix(PaParty)   
pred[1:nrow(pred), 1:ncol(pred)]<-0 
pred[Y, "id"]<-(-2)
pred[Y, "sex"]<-1  
pred[Y, paste("x", 2:9, sep = "")] <- 1
pred[Y[1],Y[2]]<-1 
pred[Y[2], Y[1]]<-1 
pred[Y[3], Y[1:2]]<-1
imp<-mice(PaParty, meth=meth, pred=pred, m=5,
        maxit = 20, seed =500, print=FALSE) completedData <- complete(imp,1)

I get the following:

Error in `[<-`(`*tmp*`, Y, paste("x", 2:9, sep = ""), value = 1) : 
  subscript out of bounds

and the resulting imputed dataset contains negative values for wototr and wototl which is not possible as the data is on a continuous scale from 0-100. Even when I increase the number of iterations it does not improve it.

Would be incredibly grateful for assistance with this or if anyone has an alternative better method for imputing this longitudinal dataset.

Upvotes: 0

Views: 723

Answers (1)

Steffen Moritz
Steffen Moritz

Reputation: 7730

The out of bounds error is simply, because your try to access something, which isn't there.

This is your error:

Error in `[<-`(`*tmp*`, Y, paste("x", 2:9, sep = ""), value = 1) : 
  subscript out of bounds

It happens in this line:

pred[Y, paste("x", 2:9, sep = "")] <- 1

It means you want to access in pred rows Y (which you defined as "wototr", "wototl", "bmi") before.

But your second argument paste("x", 2:9, sep = "") just gives "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9".

When accessing e.g. wototr / x1 and setting it to 1... there just isn't a column named x1.

Your pred looks like this: enter image description here

As you can see in your code before, e.g. pred[Y, "sex"]<-1 worked fine, because "sex" is a column that really exists - "x1" just doesn't exist, that is why you are getting an error there.

Don't know though if this fixes the overall issue with your negative values (I didn't test it). I just don't know, what you thought process behind the method and predictorMatrix was.

But I can recommend closely and completely reading this introduction by the mice developers.

In general (I don't know any further details about your data) using pmm (predictive mean matching) as method can help with problems with values outside the expected data range. With this method imputations are only based on values that are observed somewhere in the data. Often helps when you get unrealistic imputations e.g. negative body weight.

Upvotes: 0

Related Questions