Chris
Chris

Reputation: 1170

Find max value in a pandas dataframe that also has None's, Python 3.5

I have a pandas dataframe setup like this:

     Group1      Group2      Group3
0   0.04058678  0.04282689  0.06680679
1   0.11657916  0.06695174  0.05153584
2   0.08382576  0.03587087  0.08919266
3   0.17477007  0.08141088  0.10727157
4    0.0821453  0.08226264  0.06800853
5   0.15685707        None  0.09467674
6   0.08237982        None  0.14494069
7         None        None  0.14541177
8         None        None  0.12181681
9         None        None  0.17966472
10        None        None   0.1509818

I tried using df.max() to find the maximum value in the dataframe, but it doesn't work with this data and I think it is because of None in some fields.

I get this error:

print(df.max())  
TypeError: unorderable types: float() > str()

How do I deal with None in this dataframe so that I can get the maximum value?

Upvotes: 1

Views: 2587

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

is that what you want?

maximum element:

In [53]: df.replace('None', np.nan).max().max()
Out[53]: 0.17966472

or

In [46]: df.replace('None', -np.inf).max()
Out[46]:
Group3    0.179665
dtype: float64

maximum per column:

In [35]: df.replace('None', np.nan).astype(float).max()
Out[35]:
Group1    0.174770
Group2    0.082263
Group3    0.179665
dtype: float64

or indexes for max values

In [28]: df.replace('None', np.nan).astype('float').idxmax()
Out[28]:
Group1    3
Group2    4
Group3    9
dtype: int64

Explanation:

first replace all None's with np.nan (not a number):

In [56]: df.replace('None', np.nan)
Out[56]:
        Group1      Group2    Group3
0   0.04058678  0.04282689  0.066807
1   0.11657916  0.06695174  0.051536
2   0.08382576  0.03587087  0.089193
3   0.17477007  0.08141088  0.107272
4    0.0821453  0.08226264  0.068009
5   0.15685707         NaN  0.094677
6   0.08237982         NaN  0.144941
7          NaN         NaN  0.145412
8          NaN         NaN  0.121817
9          NaN         NaN  0.179665
10         NaN         NaN  0.150982

find maximum (returns pandas series):

In [59]: df.replace('None', np.nan).max()
Out[59]:
Group3    0.179665
dtype: float64

In [67]: type(df.replace('None', 0).max())
Out[67]: pandas.core.series.Series

find maximum in series:

In [68]: df.replace('None', 0).max().max()
Out[68]: 0.17966472

Upvotes: 2

jezrael
jezrael

Reputation: 863031

I think you can use:

print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).max())
Group1    0.174770
Group2    0.082263
Group3    0.179665
dtype: float64

print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).idxmax())
Group1    3
Group2    4
Group3    9
dtype: int64

Upvotes: 1

Related Questions