neha
neha

Reputation: 2068

How to find entropy of Continuous variable in Python?

I have a variable whose values are like [23.13, 56.1, 12.6, 1.23, 5.56]. I want to find the entropy of this variable. I got one code here How to compute the shannon entropy and mutual information of N variables but for continuous variables what bin size should be preferred?

Upvotes: 3

Views: 6242

Answers (2)

Vin
Vin

Reputation: 151

We can create histogram of the variable and use the bins to create finite set of categories. This would act as the discrete version of the continuous variable. Or calculate nth percentile and use them as categories.

Upvotes: -1

Paul Brodersen
Paul Brodersen

Reputation: 13021

There is no "best" bin size (unless your values fall into clearly distinct clusters).

For continuous distributions, you are better off using the Kozachenko-Leonenko k-nearest neighbour estimator for entropy (K & L 1987) and the corresponding Kraskov, ..., Grassberger (2004) estimator for mutual information.

The basic idea of the Kozachenko-Leonenko estimator is to look at (some function of) the average distance between neighbouring data points. The intuition is that if that distance is large, the dispersion in your data is large and hence the entropy is large. In practice, instead of taking the nearest neighbour distance, one tends to take the k-nearest neighbour distance, which tends to make the estimate more robust.

I have implementations for both on my github: https://github.com/paulbrodersen/entropy_estimators.

Upvotes: 8

Related Questions