Reputation: 107
Given a dataframe df
:
>>> df = pd.DataFrame([[1., -2.5, True], [2.5, -1., False]])
>>> df
0 1 2
0 1.0 -2.5 True
1 2.5 -1.0 False
>>> df.dtypes
0 float64
1 float64
2 bool
dtype: object
Taking the logarithm of the first two columns (a Pandas Dataframe) runs without errors.
>>> np.log(df.iloc[:,:2])
0 1
0 0.000000 NaN
1 0.916291 NaN
I know that it does not make sense to take the logarithm of a boolean, but if I try to take the logarithm of the three columns (a Pandas Dataframe), I get the following error:
>>> np.log(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'log'
However, if I try take the logarithm only of the third column, i.e., a Pandas Series, it runs without errors.
>>> np.log(df.iloc[:,2])
__main__:1: RuntimeWarning: divide by zero encountered in log
0 0.000000
1 -inf
Name: 2, dtype: float16
Just for the sake of curiosity: why there are these two different behaviors when applying numpy.log
in a Pandas boolean Series or in a Pandas Dataframe with a boolean column?
Upvotes: 1
Views: 423
Reputation: 210852
You can do it this way:
In [15]: np.log(df.astype(float))
...
skipped warnings
...
Out[15]:
0 1 2
0 0.000000 NaN 0.000000
1 0.916291 NaN -inf
Upvotes: 2
Reputation: 7211
You can transform it all data to float in numpy. However there are some values that are not going to have a result.
df = pd.DataFrame([[1., -2.5, True], [2.5, -1., False]])
np.log(np.array(df,dtype=np.float64))
#result
array([[ 0. , nan, 0. ],
[ 0.91629073, nan, -inf]])
Upvotes: 2