The Technomancer
The Technomancer

Reputation: 13

Discrepancy Between Two Methods of Finding Information Entropy

So I learned about the concept of information entropy from Khan Academy where is was phrased in the form of "average amount of yes or no questions needed per symbol". They also gave an alternative form using logarithms.

So let's say we have a symbol generator that produces A,B, and C. P(A)=1/2, P(B)=1/3, and P(C)=1/6 According to their method, I would gat a chart like this: First method

Then I would multiply their probability of occurring by the amount of questions needed for each giving (1/2)*1+(1/3)*2+(1/6)*2 = 1.5bits

but their other method gives -(1/2)log2(1/2)-(1/3)log2(1/3)-(1/6)log2(1/6)= 1.459... bits

The difference is small, but still significant. I've tried this with different combinations and probabilities and got similar results. Is there something I'm missing? Am I using either method wrong, or is one of them more conditional?

Upvotes: 1

Views: 35

Answers (1)

Sneftel
Sneftel

Reputation: 41484

Your second calculation is correct.

The problem with your decision tree approach is that the decision tree is not optimal (and indeed, no binary decision tree could be for those probabilities). Your “is it B” decision node represents less than one bit of information, since once you get there you already know it’s probably B. So your decision tree represents a potential encoding of symbols which is expected to consume 1.5 bits on average, but it represents slightly less than 1.5 bits of information.

In order to have a binary tree which represents an optimal encoding, each node needs to have balanced probabilities. This is not possible if some symbol has a probability whose denominator is not a power of 2.

Upvotes: 1

Related Questions