wayneeusa
wayneeusa

Reputation: 194

'mice' R package isn't imputing data

I've run a regression to replace missing data in a dataset and want to compare it to the results of using the 'mice' package by Stef va Buuren

I'm referencing this link here on cross-validated Link to Post

I'm also reading This which is similar syntax and usage.

My code is:

imp <- mice(without_response, method = "norm.predict", m = 1) 
#Impute data
imp_with_mice <- complete(imp) # Store data

When I output:

imp_with_mice[impute_here,]

to get the rows that need imputing, none of the values are replaced. I originally had '?' where the missing data was. I've now tried 'NA' as a string and then NA without quote marks to resemble the cv post.

In no instance can I get mice to replace my 16 column 7 values with anything at all.

Please help me with usage.

These are examples of rows where I would expect a variable to be replaced:

      V1  V2 V3 V4 V5 V6 V7 V8 V9 V10 
24 1057013 8  4 5   1  2 NA  7  3  1 
41 1096800 6  6 6   9  6 NA  7  8  1

Also, I get this display when it runs.

iter imp variable 1 1 2 1 3 1 4 1 5 1

Warning message:

“Number of logged events: 1”

Additional info:

str(without_response[impute_here,])

'data.frame':   16 obs. of  10 variables:
$ V1 : int  1057013 1096800 1183246 1184840 1193683 1197510
1241232 169356 432809 563649 ...
$ V2 : int  8 6 1 1 1 5 3 3 3 8 ...
$ V3 : int  4 6 1 1 1 1 1 1 1 8 ...
$ V4 : int  5 6 1 3 2 1 4 1 3 8 ...
$ V5 : int  1 9 1 1 1 1 1 1 1 1 ...
$ V6 : int  2 6 1 2 3 2 2 2 2 2 ...
$ V7 : chr  NA NA NA NA ...
$ V8 : int  7 7 2 2 1 3 3 3 2 6 ...
$ V9 : int  3 8 1 1 1 1 1 1 1 10 ...
$ V10: int  1 1 1 1 1 1 1 1 1 1 ...

summary(without_response[impute_here,])

      V1                V2              V3              V4       
Min.   :  61634   Min.   :1.000   Min.   :1.000   Min.   :1.000  
1st Qu.: 595517   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
Median :1057040   Median :3.000   Median :1.000   Median :2.500  
Mean   : 857578   Mean   :3.375   Mean   :2.438   Mean   :2.875  
3rd Qu.:1187051   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.250  
Max.   :1241232   Max.   :8.000   Max.   :8.000   Max.   :8.000  
       V5              V6             V7                  V8       
Min.   :1.000   Min.   :1.000   Length:16          Min.   :1.000  
1st Qu.:1.000   1st Qu.:2.000   Class :character   1st Qu.:2.000  
Median :1.000   Median :2.000   Mode  :character   Median :2.500  
Mean   :1.812   Mean   :2.438                      Mean   :3.125  
3rd Qu.:1.000   3rd Qu.:2.000                      3rd Qu.:3.250  
Max.   :9.000   Max.   :7.000                      Max.   :7.000  
      V9             V10   
Min.   : 1.00   Min.   :1  
1st Qu.: 1.00   1st Qu.:1  
Median : 1.00   Median :1  
Mean   : 2.75   Mean   :1  
3rd Qu.: 3.00   3rd Qu.:1  
Max.   :10.00   Max.   :1 

is.na(without_response[impute_here,])

      V1     V2      V3      V4      V5      V6      V7      V8      V9   V10
24  FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE  FALSE
41  FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
140 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
146 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
159 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
165 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
236 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
250 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
276 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
293 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
295 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
298 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
316 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
322 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
412 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE
618 FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   TRUE    FALSE    FALSE   FALSE

Upvotes: 0

Views: 10947

Answers (1)

Niek
Niek

Reputation: 1624

In my understanding of your question and dataset (as I said before a reproducible example would be helpful), I suspect that the problem is that V7 only has NA and constant values. This is what the logged events warn you about. mice cannot impute such variables as it has no basis to make predictions about what the missing values should be.

mice(... method = "norm.predict") works by imputing plausible values based on linear regression between the variable with missing values and other variables in your dataset. It uses existing data to make predictions about plausible values. However, since V7 is a constant it has no variance and no co-variance with other variables. As such, predictions are not possible. Multiple imputation cannot be used in this situation. There is no reasonable imputation that can be made apart from assuming that all values in V7 are constant (i.e. mean imputation). Be aware that there are some major downsides to this if this assumption is invalid. Your other best option is pairwise deletion.

Upvotes: 6

Related Questions