user3231352
user3231352

Reputation: 809

missing values when creating training and testing data with caret

My question is about how handle missing values when using train for fitting models with caret. A small sample of my data would be like that:

       df <- dput(dat)
       structure(list(LagO3 = c(NA, NA, NA, 40, 45, NA), RH = c(69.4087524414062, 
       79.9608383178711, 64.4592437744141, 66.4207077026367, 66.0899200439453, 
       91.3353729248047), SR = c(298.928888888889, 300.128888888889, 
       303.688888888889, 304.521111111111, 303.223333333333, 294.716666666667
       ), ST = c(317.9917578125, 317.448253038194, 311.039059244792, 
       312.557927517361, 321.252841796875, 330.512212456597), Tmx = c(294.770359293045, 
       294.897191864461, 295.674552786042, 296.247345044048, 296.108238352818, 
       294.594430242372), CWTE = c(0, 1, 0, 0, 0, 0), CWTW = c(0, 0, 
       0, 0, 0, 0), o3 = c(NA, NA, NA, 52, 55, NA)), .Names = c("LagO3", 
       "RH", "SR", "ST", "Tmx", "CWTE", "CWTW", "o3"), row.names = c("1", 
       "2", "3", "4", "5", "6"), class = "data.frame")

The problem is that for several positions in one of my predictors I have NA and the predictand (o3) has also NA (but in different positions). Then, I tried:

model <- train(x = na.omit(x.training), y = na.omit(training$o3), method = "lmStepAIC",
               direction="backward", trControl = control)

But, I would have different length for y ... I tried to use:

 model <- train(x = x.training, y = training$o3,na.action=na.pass, 
                method = "lmStepAIC",direction="backward",trControl = control)

having the following error:

Error in quantile.default(y, probs = seq(0, 1, length = cuts)) : missing values and NaN's not allowed if 'na.rm' is FALSE

I would appreciate any suggestion!

Thanks a lot.

Upvotes: 2

Views: 7725

Answers (1)

LyzandeR
LyzandeR

Reputation: 37889

You need to use the na.action argument with na.omit of the train function. As the documentation says for na.action (type ?train):

A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

So the following will work:

model <- train(x = x.training, y = training$o3, 
              method = "lmStepAIC",direction="backward", 
              trControl = control, na.action=na.omit)

Output:

> model <- train(x = x.training, y = y.training, method = "lmStepAIC",direction="backward",
+                na.action=na.omit)
Start:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE + CWTW


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST


Step:  AIC=-129.7
.outcome ~ LagO3 + RH + SR


Step:  AIC=-129.7
.outcome ~ LagO3 + RH


Step:  AIC=-129.7
.outcome ~ LagO3


Step:  AIC=-129.7
.outcome ~ 1
...
...
...

Upvotes: 4

Related Questions