Reputation: 69
I am trying to calculate the information entropy of a probability distribution, but I'm getting 2 different answers and I don't know why or which is correct.
import numpy as np
from scipy.special import entr
from scipy.stats import entropy
np.random.seed(123)
data = np.random.rand(5)
e = entropy(data,base=2) #this one is different, why?
f = np.sum(entr(data))/np.log(2)
g = -np.sum(data*np.log2(data))
Any idea where the error is?
Upvotes: 1
Views: 2197
Reputation: 114781
entropy
automatically normalizes the input so that the sum of the probability vector is 1. Your calculations for f
and g
do not.
If you normalize data
, e.g.,
data = np.random.rand(5)
data /= data.sum()
the results will agree:
In [35]: data = np.random.rand(5)
In [36]: data /= data.sum()
In [37]: entropy(data, base=2)
Out[37]: 2.2295987226926375
In [38]: np.sum(entr(data))/np.log(2)
Out[38]: 2.2295987226926375
In [39]: -np.sum(data*np.log2(data))
Out[39]: 2.2295987226926375
Upvotes: 7