Anienumaked
Anienumaked

Reputation: 85

How can I use missRanger to imputer missing integer values?

I am learning about imputation by trying to use R and missRanger to impute missing variables that must be integers. However, I get the following error:

## Error: Assigned data `if (...) NULL` must be compatible with existing data.
## i Error occurred for column `beds`.
## x Can't convert from <double> to <integer> due to loss of precision.
## * Locations: 1, 2.

It seems to be that I cannot impute integer values, but I can if I make them decimal first.

Here is a reprex:

library(tidyverse)
library(missRanger)

# Here is a sample of the data
reprex_df

## # A tibble: 9 x 5
##    beds baths garages  price property_type
##   <int> <int>   <int>  <int> <chr>        
## 1    NA    NA      NA 770000 house        
## 2     2     1       0 300000 apartment    
## 3     2     2       2 735000 apartment    
## 4    NA    NA      NA 550000 apartment    
## 5     4     2       3 500000 house        
## 6     2     1       0 400000 apartment    
## 7     4     2       2 607000 house        
## 8     3     2       2 590000 house        
## 9     4     1       2 710000 house

# Try to impute missing bedrooms
imputed <- reprex_df %>% 
  missRanger()

## 
## Missing value imputation by random forests
## 
##   Variables to impute:       beds, baths, garages
##   Variables used to impute:  beds, baths, garages, price, property_type
## iter 1:  

## Error: Assigned data `if (...) NULL` must be compatible with existing data.
## i Error occurred for column `beds`.
## x Can't convert from <double> to <integer> due to loss of precision.
## * Locations: 1, 2.

# Convert integers to numerics and try again
imputed2 <- reprex_df %>% 
  mutate_if(is.integer,
            as.numeric) %>% 
  missRanger()

## 
## Missing value imputation by random forests
## 
##   Variables to impute:       beds, baths, garages
##   Variables used to impute:  beds, baths, garages, price, property_type
## iter 1:  ...
## iter 2:  ...
## iter 3:  ...
## iter 4:  ...
## iter 5:  ...

# That works, but decimal rooms don't make sense
imputed2

## # A tibble: 9 x 5
##    beds baths garages  price property_type
##   <dbl> <dbl>   <dbl>  <dbl> <chr>        
## 1  3.44  1.86    2.15 770000 house        
## 2  2     1       0    300000 apartment    
## 3  2     2       2    735000 apartment    
## 4  2.77  1.83    1.84 550000 apartment    
## 5  4     2       3    500000 house        
## 6  2     1       0    400000 apartment    
## 7  4     2       2    607000 house        
## 8  3     2       2    590000 house        
## 9  4     1       2    710000 house

How can I impute missing integers using missRanger?

Upvotes: 2

Views: 711

Answers (1)

Michael M
Michael M

Reputation: 1593

Calling a dataset "reprex" does not make the example reproducible...

Since missRanger cannot change the way how a tibble internally reacts on type casting, here two suggestions:

  1. Convert the tibble to a data.frame before calling missRanger or (and this is my favourite)

  2. use the argument pmm.k to use predictive mean matching between the iterations. This has the nice side effect of filling gaps by realistic values. Integers will stay integers etc.

The vignette of missRanger explains these concepts, see https://cran.r-project.org/web/packages/missRanger/index.html

Disclaimer: I am the package maintainer of missRanger.

library(missRanger)
library(tidyverse)

# Example data
mtcars2 <- mtcars %>% 
  as_tibble() %>% 
  mutate(cyl = as.integer(cyl)) %>% 
  generateNA()

missRanger(mtcars2, pmm.k = 3, seed = 153)

# Gives
# # A tibble: 32 x 11
# mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
# <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  21       6  160    105  3.9   2.62  16.5     0     1     4     4
#  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#  21.4     6  258    110  3.08  3.22  19.4     1     0     3     2

Upvotes: 2

Related Questions