Reputation: 2668
What happens when using max() and min() on pandas.core.series.Series type that has NaN in it? Is this a bug? See below,
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])
mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)
# Both return NaN!
print(min(mydata['Has NaN']), max(mydata['Has NaN']))
# Still why False? Isn't float('nan') a singleton like None?
print(min(mydata['Has NaN']) == max(mydata['Has NaN']))
# But this time works well!
print(min([1, 2, 3, float('nan')]))
print('\n')
# When Series data type that has NaN bumps into min() and max(), what should
# I do? E.g.,
try:
n, bins, patches = plt.hist(mydata['Has NaN'], 10)
except ValueError as e:
print(e, '\nSeems "range" argument in hist() has problem!')
Upvotes: 2
Views: 175
Reputation: 96028
First, you shouldn't use the Python built-in max
or min
when dealing with pandas
or numpy
, especially when you are working with nan
.
Since 'nan' is the first item of mydata['Has NaN']
, it is never replaced in either max
or min
because (as stated in the docs):
The not-a-number values float('NaN') and Decimal('NaN') are special. They are identical to themselves (x is x is true) but are not equal to themselves (x == x is false). Additionally, comparing any number to a not-a-number value will return False. For example, both 3 < float('NaN') and float('NaN') < 3 will return False.
Instead, use the pandas
max
and min
methods:
In [4]: mydata['Has NaN'].min()
Out[4]: -176.9844930355774
In [5]: mydata['Has NaN'].max()
Out[5]: 12.684033138603787
With regards to the histogram, it seems this is a known issue with plt.hist
, see here and here.
It should be fairly straightforward to deal with for now, though:
n, bins, patches = plt.hist(mydata['Has NaN'][~mydata['Has NaN'].isnull()], 10)
Upvotes: 3
Reputation: 210872
you should use Pandas or NumPy functions instead of vanilla Python ones:
In [7]: mydata['Has NaN'].min(), mydata['Has NaN'].max()
Out[7]: (-46.00309057827485, 62.430829637766671)
In [8]: min(mydata['Has NaN']), max(mydata['Has NaN'])
Out[8]: (nan, nan)
In [125]: mydata.plot.hist(alpha=0.5)
Out[125]: <matplotlib.axes._subplots.AxesSubplot at 0x1a784588>
Upvotes: 3