Reputation: 501
I am trying to estimate the entropy of Random Variables (RVs), which involves a calculation of step: p_X * log(p_X)
.
For example,
import numpy as np
X = np.random.rand(100)
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.log(p_X))
Sometimes p_X
shall be zero which mathematically make the whole term as zero. But python makes p_X * np.log(p_X)
as NaN
and makes the whole summation as NaN
. Is there any way to manage (without any explicit checking for NaN) making p_X * np.log(p_X)
to give zero whenever p_X
is zero? Any insight and correction is appreciated and Thanks in advance:)
Upvotes: 6
Views: 887
Reputation: 53109
If you have scipy
, use scipy.special.xlogy(p_X,p_X)
. Not only does it solve your problem, as an added benefit it is also a bit faster than p_X*np.log(p_X)
.
Upvotes: 6
Reputation: 88305
You can use a np.ma.log
, which will mask 0
s and use the filled
method to fill the masked array with 0
:
np.ma.log(p_X).filled(0)
For instance:
np.ma.log(range(5)).filled(0)
# array([0. , 0. , 0.69314718, 1.09861229, 1.38629436])
X = np.random.rand(100)
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.ma.log(p_X).filled(0))
Upvotes: 4