Reputation: 21
I am trying to wrap my head around the concept of information in the context of entropy. Let me first introduce some things to make it clear what I mean with the terms I am using.
Entropy: [1]: https://en.wikipedia.org/wiki/Entropy_(information_theory)
"In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes."
\sum_{i=-1}^n - p_i * log(p_i)
So the question that came up for me was: What is information and how do we quantify it? Now I've read a lot of times that -log_2(p_i) (Solution to: 2^x= 1/p_i) tells us how many bits of Information the event i with probability p_i has. So for example, if I have a fair coin the number of bits of information I have for tails (or heads) is -log(0.5)=1 and the total entropy is H(p)=0.5 * 1 + 0.5 * 1 = 1. This should give me the average amount of information (number of bits) I obtain when flipping the fair coin.
So far so good. But what if the coin isn't fair? Let's say p(heads)=0.1, p(tails)=0.9. According to the definition I get H(p)= 0.468996. Which tells me that on average I get only around 0.47 bits of information when flipping this coin. But why is there a difference? Since intuitively, in both cases I am only getting the information whether it's heads or tails, in other words zero or one, that's 1 bit. If I just want to obtain the result of the coin toss, I am not really interested in the probability of each event anyway. It is especially confusing for me that apparently the information value of heads (-log_2(0.1)) is much higher than that of tails (-log_2(0.9)).
The only way I can make sense of the terminology is in the following example: Imagine you want to find a mushroom in a forest, which is split in two parts. One part is a third of the area and the other 2 thirds and the mushroom's location is random (uniformly distributed). And there is exactly one mushroom in the whole forest per season. If some magic mashine tells you that it's in the first part, it makes sense to me that this message contains more information since it effectively divides the area you have to search by a factor of 3. The essence is that if you would be satisfied with only knowing in which part of the forest the mushroom is, you wouldnt care how large the area is (i.e. how high the probability is), its just: is it the first or the second part.
Upvotes: 1
Views: 439
Reputation: 11947
This is not a comprehensive answer, as that usually would have the format of a 1 semester course on signal theory. Instead, I try to give you a means to see the difference with your own eyes:
Write yourself a program, which produces a character string of 0 and 1 characters, using a random number generator for both Case A and Case B.
Save the string to a file and compress both files with your favorite compression tool (e.g. ZIP or some runlength encoding etc.).
Compare the length of the compressed files to the points you pointed out in your question. Why does the file using Case B obtain higher compression rates?
Upvotes: 1