Reputation: 644
I have a df with growth variables and often some initial values are 0, in which case it produces infinite values when the value moves from zero to non-zeros.
i.e.
.. some variables... var1 var2 var1_growth var2_growth
0 0 NaN NaN
0 1 NaN inf
1 2 inf 1
1.5 2.2 0.5 0.1
...
when i run PanelOLS, i get an error message
ValueError: array must not contain infs or NaNs
Is there a way to ignore these entries to continue with the regression without having to drop them and create a different dataset?
If not, what would be the best way to proceed? should I drop app rows with 'inf' values in both columns? is there an easy way to do this? thanks.
Upvotes: 0
Views: 4354
Reputation: 2042
No, you can't ignore these entries. This issue need to be handle before training the model, if not, you can not train it.
Depending on your data and application a different method is preferred to handle these NaN
and inf
. One example of code that is posted in this SO question:
df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) # You can replace inf and -inf with NaN, and then select non-null rows.
In this case, we are removing all rows that have inf
or NaN
values.
Upvotes: 1