Mubaraka
Mubaraka

Reputation: 69

Scipy.stats.entropy is giving a different result to entropy formula

I am trying to calculate the information entropy of a probability distribution, but I'm getting 2 different answers and I don't know why or which is correct.

  1. I tried using scipy.stats.entropy.
  2. I then looked at the source code for scipy.stats.entropy, and as in the source code, I calculated entropy using scipy.special.entr, and got a different answer.
  3. I then calculated entropy using purely the formula given on the scipy.stats.entropy page, and got the same answer as 2.
import numpy as np
from scipy.special import entr
from scipy.stats import entropy
np.random.seed(123)

data = np.random.rand(5)

e = entropy(data,base=2)          #this one is different, why?
f = np.sum(entr(data))/np.log(2)
g = -np.sum(data*np.log2(data))

Any idea where the error is?

Upvotes: 1

Views: 2197

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114781

entropy automatically normalizes the input so that the sum of the probability vector is 1. Your calculations for f and g do not.

If you normalize data, e.g.,

data = np.random.rand(5)
data /= data.sum()

the results will agree:

In [35]: data = np.random.rand(5)                                                            

In [36]: data /= data.sum()                                                                  

In [37]: entropy(data, base=2)                                                               
Out[37]: 2.2295987226926375

In [38]: np.sum(entr(data))/np.log(2)                                                        
Out[38]: 2.2295987226926375

In [39]: -np.sum(data*np.log2(data))                                                         
Out[39]: 2.2295987226926375

Upvotes: 7

Related Questions