RJL
RJL

Reputation: 351

pandas mean() results in INF?

df['C'] =   df['A'] / df['B']
df['C'].mean() 

This returns a valid value.

However,

df['D'] =   df['B'] / df['A']
df['D'].mean() 

returns inf.

Any thoughts on why? what does 'inf' mean here?

I downloaded the file and calculated the mean of 'D' in excel. it returned a valid value. there is no invalid values of 'D'.

Upvotes: 1

Views: 3169

Answers (1)

David Erickson
David Erickson

Reputation: 16683

You have a zero in one of your rows, which is creating issues. Consider the example:

df = pd.DataFrame({'A' : [1,2,3,0],
                   'B' : [2,3,4,5]})
df['C'] = df['B'] / df['A']
df['D'] = df['A'] / df['B']
df
Out[1]: 
   A  B         C         D
0  1  2  2.000000  0.500000
1  2  3  1.500000  0.666667
2  3  4  1.333333  0.750000
3  0  5       inf  0.000000

Because, df['A'] has a zero in it, when you do the calculation, df['C'] = df['B'] / df['A'], the result will be inf for the row where A = 0. That is expected mathematically when 0 is on the denominator. However, when 0 is on the numerator, the result you would expect mathematically would be 0.

Therefore, when you take the mean() of multiple value and one value = inf, then the mean will be inf. This is also expected mathematically. The solution would be to replace inf values with np.nan:

df = df.replace(np.inf, np.nan)

Out[2]: 
   A  B         C         D
0  1  2  2.000000  0.500000
1  2  3  1.500000  0.666667
2  3  4  1.333333  0.750000
3  0  5       NaN  0.000000

Now, you are ready to get the means of each column. Full code below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [1,2,3,0],
                   'B' : [2,3,4,5]})
df['C'] = df['B'] / df['A']
df['D'] = df['A'] / df['B']
df = df.replace(np.inf, np.nan)
df['C'].mean(), df['D'].mean()
Out[3]: (1.611111111111111, 0.47916666666666663)

Upvotes: 1

Related Questions