Syeman
Syeman

Reputation: 67

IterativeImputer error: Input contains NaN, infinity or a value too large for dtype('float64')

I am working on a dataset with several missing values in its attributes.

Having done the typical procedure of data preprocessing, my next step is trying to do to fit a regression model to impute missing values. However, when I try to use the IterativeImputer from fancyimpute. I run in to this error:

C:\Users\User.DC241-12\Anaconda3\lib\site-packages\sklearn\linear_model\ridge.py:942: RuntimeWarning: overflow encountered in square
  v = s ** 2
****hierarchy of filenames in which error is happening****
Input contains NaN, infinity or a value too large for dtype('float64')

I understand that missing values input to the IterativeImputer are to be represented as NaNs so I guess that is not the reason here. Should I be scaling my data to before passing on to the imputation process. But wouldnt that affect the imputation process?

Thanks!

Upvotes: 0

Views: 1243

Answers (1)

jGraves
jGraves

Reputation: 83

I had a similar issue to this. The issue for me was that some of my values being fed into the imputer were quite large (values > 10,000,000) and had a large dataset (500,000+ rows). These large values get compounded somehow in the algorithm that IterativeImputer uses, and overflow numpy's float64.

Try scaling your values, imputing, and then scaling back up (reverse the process of scaling down) once the imputation is done.

Upvotes: 0

Related Questions