Reputation: 581
I'm trying to impute values from a dataset using hmisc. I'm following this guide.
Here is a reproducible example of my code:
#Create dataset and add 0.1 NA values randomly
data <- iris
library(missForest)
library(Hmisc)
iris.mis <- prodNA(iris, noNA = 0.1)
#Calculating imputed values with aregImpute
impute_arg <- aregImpute(~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width + Species, data = iris.mis, n.impute = 5)
completeData2 <- impute.transcan(impute_arg, imputation=1, data=iris.mis, list.out=TRUE,pr=FALSE, check=FALSE)
head(completeData2)
#creating a fit model
library(rms)
fmi <- fit.mult.impute(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, ols, impute_arg, data=iris.mis)
My question is: How do I apply this fit model to my data and impute the NA values in my dataset (iris.mis)?
Answers with code snippets would be greatly appreciated.
Upvotes: 3
Views: 755
Reputation: 174468
All you need to do is get the model's predictions:
model_predictions <- predict(fmi)
Now you can examine the predictions at the data's missing indices:
missing <- which(is.na(iris.mis$Sepal.Length))
imputed <- model_predictions[missing]
imputed
#> 5 22 27 32 34 35 54 60
#> 5.073695* 5.119113* 5.182343* 4.949794* 5.381427* 4.863149* 5.565716* 5.596861*
#> 89 102 107 117 131 135 145 149
#> 5.950823* 6.217764* 5.757642* 6.829916* 7.116657* 6.726274* 6.738296* 6.662452*
#> 150
#> 6.428420*
And see how they compare to the actual values:
actual <- iris$Sepal.Length[missing]
plot(x = actual, y = imputed, xlim = c(4, 8), ylim = c(4, 8), col = "red",
xlab = "Actual", ylab = "Imputed", main = "Imputed vs Actual Sepal Length")
lines(c(4, 8), c(4, 8), lty = 2)
#> # calculate residuals
imputed - actual
#> 5 22 27 32 34 35
#> 0.07369483* 0.01911295* 0.18234346* -0.45020634* -0.11857279* -0.03685114*
#> 54 60 89 102 107 117
#> 0.06571631* 0.39686061* 0.35082282* 0.41776385* 0.85764178* 0.32991602*
#> 131 135 145 149 150
#> -0.28334270* 0.62627448* 0.03829600* 0.46245174* 0.52842038*
#>
#> # sum of squared errors
sum((imputed - actual)^2)
#> [1] 2.52802
So, if you want a new column in your set complete with the imputations you can do
iris.mis$Sepal.Length.Imputed <- iris.mis$Sepal.Length
iris.mis$Sepal.Length.Imputed[is.na(iris.mis$Sepal.Length.Imputed)] <- imputed
Upvotes: 3