Numpy Logarithm works for boolean Pandas Series but not for boolean column in Dataframe

Question

Given a dataframe df:

>>> df = pd.DataFrame([[1., -2.5, True], [2.5, -1., False]])
>>> df
     0    1      2
0  1.0 -2.5   True
1  2.5 -1.0  False
>>> df.dtypes
0    float64
1    float64
2       bool
dtype: object

Taking the logarithm of the first two columns (a Pandas Dataframe) runs without errors.

>>> np.log(df.iloc[:,:2])
          0   1
0  0.000000 NaN
1  0.916291 NaN

I know that it does not make sense to take the logarithm of a boolean, but if I try to take the logarithm of the three columns (a Pandas Dataframe), I get the following error:

>>> np.log(df)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'float' object has no attribute 'log'

However, if I try take the logarithm only of the third column, i.e., a Pandas Series, it runs without errors.

>>> np.log(df.iloc[:,2])
__main__:1: RuntimeWarning: divide by zero encountered in log
0    0.000000
1        -inf
Name: 2, dtype: float16

Just for the sake of curiosity: why there are these two different behaviors when applying numpy.log in a Pandas boolean Series or in a Pandas Dataframe with a boolean column?

MaxU - stand with Ukraine · Accepted Answer

You can do it this way:

In [15]: np.log(df.astype(float))
...
skipped warnings
...
Out[15]:
          0   1         2
0  0.000000 NaN  0.000000
1  0.916291 NaN      -inf

Numpy Logarithm works for boolean Pandas Series but not for boolean column in Dataframe

Answers (2)

Related Questions