Reputation: 1170
I have a pandas dataframe setup like this:
Group1 Group2 Group3
0 0.04058678 0.04282689 0.06680679
1 0.11657916 0.06695174 0.05153584
2 0.08382576 0.03587087 0.08919266
3 0.17477007 0.08141088 0.10727157
4 0.0821453 0.08226264 0.06800853
5 0.15685707 None 0.09467674
6 0.08237982 None 0.14494069
7 None None 0.14541177
8 None None 0.12181681
9 None None 0.17966472
10 None None 0.1509818
I tried using df.max() to find the maximum value in the dataframe, but it doesn't work with this data and I think it is because of None in some fields.
I get this error:
print(df.max())
TypeError: unorderable types: float() > str()
How do I deal with None in this dataframe so that I can get the maximum value?
Upvotes: 1
Views: 2587
Reputation: 210882
is that what you want?
maximum element:
In [53]: df.replace('None', np.nan).max().max()
Out[53]: 0.17966472
or
In [46]: df.replace('None', -np.inf).max()
Out[46]:
Group3 0.179665
dtype: float64
maximum per column:
In [35]: df.replace('None', np.nan).astype(float).max()
Out[35]:
Group1 0.174770
Group2 0.082263
Group3 0.179665
dtype: float64
or indexes for max values
In [28]: df.replace('None', np.nan).astype('float').idxmax()
Out[28]:
Group1 3
Group2 4
Group3 9
dtype: int64
Explanation:
first replace all None's with np.nan (not a number):
In [56]: df.replace('None', np.nan)
Out[56]:
Group1 Group2 Group3
0 0.04058678 0.04282689 0.066807
1 0.11657916 0.06695174 0.051536
2 0.08382576 0.03587087 0.089193
3 0.17477007 0.08141088 0.107272
4 0.0821453 0.08226264 0.068009
5 0.15685707 NaN 0.094677
6 0.08237982 NaN 0.144941
7 NaN NaN 0.145412
8 NaN NaN 0.121817
9 NaN NaN 0.179665
10 NaN NaN 0.150982
find maximum (returns pandas series):
In [59]: df.replace('None', np.nan).max()
Out[59]:
Group3 0.179665
dtype: float64
In [67]: type(df.replace('None', 0).max())
Out[67]: pandas.core.series.Series
find maximum in series:
In [68]: df.replace('None', 0).max().max()
Out[68]: 0.17966472
Upvotes: 2
Reputation: 863031
I think you can use:
print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).max())
Group1 0.174770
Group2 0.082263
Group3 0.179665
dtype: float64
print (df.apply(lambda x: pd.to_numeric(x, errors='coerce')).idxmax())
Group1 3
Group2 4
Group3 9
dtype: int64
Upvotes: 1