Reputation: 21
When I tried to create a new column in my pandas dataframe by dividing an existing column by another existing column, I am getting 'inf' in rows where there is no division by zero.
claims_report['% COST DIFFERENCE'] = 100*claims_report['COST DIFFERENCE']/claims_data['ORIGINAL UNIT COST']
print(claims_report[['ORIGINAL UNIT COST','COST DIFFERENCE','% COST DIFFERENCE']].head(9))
The result of the above code is:
ORIGINAL UNIT COST COST DIFFERENCE % COST DIFFERENCE
0 4.3732 11.2500 257.248697
1 3.7935 22.0000 579.939370
2 6.9167 22.0000 318.070756
3 1.1429 4.5000 393.735235
4 0.0000 7.3269 inf
5 7.3269 -0.8622 -11.767596
6 6.4647 0.7853 12.147509
7 0.2590 0.0170 6.563707
8 14.4471 -12.7145 -inf
By my calculations, there should not be a -inf in row 8. As a check I ran the following code:
for i in range(9):
print(i, claims_report['COST DIFFERENCE'][i], claims_report['ORIGINAL UNIT COST'][i], claims_report['COST DIFFERENCE'][i]/claims_report['ORIGINAL UNIT COST'][i])
Which gives me the expected result in row 8:
0 11.25 4.3732 2.5724869660660388
1 22.0 3.7935 5.799393699749571
2 22.0 6.9167 3.180707562855119
3 4.5 1.1429 3.937352349286902
4 7.3269 0.0 inf
5 -0.8622 7.3269 -0.11767596118412971
6 0.7853 6.4647 0.1214750877844293
7 0.017 0.259 0.06563706563706564
8 -12.7145 14.4471 -0.880072817382035
Anyone familiar with this type of issue?
Upvotes: 0
Views: 2852
Reputation: 286
Another solution in the future may be to do:
import pandas as pd
pd.set_option('use_inf_as_na', True)
which sets any values in your pandas dataframe from 'inf' to 'nan'. Then you can use the fillna
method like this:
df = df.fillna(value=0, inplace=True)
Upvotes: 0
Reputation: 101
In your first line
claims_report['% COST DIFFERENCE'] = 100*claims_report['COST DIFFERENCE']/claims_data['ORIGINAL UNIT COST']
Didn't you mean "claims_report" instead of "claims_data"? Maybe you're just selecting the wrong dataframe?
Upvotes: 1