Vigneswaran C
Vigneswaran C

Reputation: 501

Handling zero multiplied with NaN

I am trying to estimate the entropy of Random Variables (RVs), which involves a calculation of step: p_X * log(p_X). For example,

import numpy as np
X = np.random.rand(100)   
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.log(p_X))

Sometimes p_X shall be zero which mathematically make the whole term as zero. But python makes p_X * np.log(p_X) as NaN and makes the whole summation as NaN. Is there any way to manage (without any explicit checking for NaN) making p_X * np.log(p_X) to give zero whenever p_X is zero? Any insight and correction is appreciated and Thanks in advance:)

Upvotes: 6

Views: 887

Answers (3)

Paul Panzer
Paul Panzer

Reputation: 53109

If you have scipy, use scipy.special.xlogy(p_X,p_X). Not only does it solve your problem, as an added benefit it is also a bit faster than p_X*np.log(p_X).

Upvotes: 6

yatu
yatu

Reputation: 88305

You can use a np.ma.log, which will mask 0s and use the filled method to fill the masked array with 0:

np.ma.log(p_X).filled(0)

For instance:

np.ma.log(range(5)).filled(0)
# array([0.        , 0.        , 0.69314718, 1.09861229, 1.38629436])

X = np.random.rand(100)   
binX = np.histogram(X, 10)[0] #create histogram with 10 bins
p_X = binX / np.sum(binX)
ent_X = -1 * np.sum(p_X * np.ma.log(p_X).filled(0))

Upvotes: 4

Dan
Dan

Reputation: 45762

In your case you can use nansum since adding 0 in sum is the same thing as ignoring a NaN:

ent_X = -1 * np.nansum(p_X * np.log(p_X))

Upvotes: 4

Related Questions