Olive
Olive

Reputation: 644

How to deal with "ValueError: array must not contain infs or NaNs" while running regressions in python

I have a df with growth variables and often some initial values are 0, in which case it produces infinite values when the value moves from zero to non-zeros.

i.e.

.. some variables... var1   var2   var1_growth  var2_growth
                      0      0        NaN          NaN
                      0      1        NaN          inf
                      1      2        inf           1
                     1.5    2.2       0.5          0.1
...

when i run PanelOLS, i get an error message

ValueError: array must not contain infs or NaNs

Is there a way to ignore these entries to continue with the regression without having to drop them and create a different dataset?

If not, what would be the best way to proceed? should I drop app rows with 'inf' values in both columns? is there an easy way to do this? thanks.

Upvotes: 0

Views: 4354

Answers (1)

Alex Serra Marrugat
Alex Serra Marrugat

Reputation: 2042

No, you can't ignore these entries. This issue need to be handle before training the model, if not, you can not train it.

Depending on your data and application a different method is preferred to handle these NaN and inf. One example of code that is posted in this SO question:

df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) # You can replace inf and -inf with NaN, and then select non-null rows.

In this case, we are removing all rows that have inf or NaN values.

Upvotes: 1

Related Questions