Reputation: 351
df['C'] = df['A'] / df['B']
df['C'].mean()
This returns a valid value.
However,
df['D'] = df['B'] / df['A']
df['D'].mean()
returns inf.
Any thoughts on why? what does 'inf' mean here?
I downloaded the file and calculated the mean of 'D' in excel. it returned a valid value. there is no invalid values of 'D'.
Upvotes: 1
Views: 3169
Reputation: 16683
You have a zero in one of your rows, which is creating issues. Consider the example:
df = pd.DataFrame({'A' : [1,2,3,0],
'B' : [2,3,4,5]})
df['C'] = df['B'] / df['A']
df['D'] = df['A'] / df['B']
df
Out[1]:
A B C D
0 1 2 2.000000 0.500000
1 2 3 1.500000 0.666667
2 3 4 1.333333 0.750000
3 0 5 inf 0.000000
Because, df['A']
has a zero in it, when you do the calculation, df['C'] = df['B'] / df['A']
, the result will be inf
for the row where A
= 0
. That is expected mathematically when 0 is on the denominator. However, when 0 is on the numerator, the result you would expect mathematically would be 0.
Therefore, when you take the mean()
of multiple value and one value = inf
, then the mean will be inf
. This is also expected mathematically. The solution would be to replace inf
values with np.nan
:
df = df.replace(np.inf, np.nan)
Out[2]:
A B C D
0 1 2 2.000000 0.500000
1 2 3 1.500000 0.666667
2 3 4 1.333333 0.750000
3 0 5 NaN 0.000000
Now, you are ready to get the means of each column. Full code below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [1,2,3,0],
'B' : [2,3,4,5]})
df['C'] = df['B'] / df['A']
df['D'] = df['A'] / df['B']
df = df.replace(np.inf, np.nan)
df['C'].mean(), df['D'].mean()
Out[3]: (1.611111111111111, 0.47916666666666663)
Upvotes: 1