9dan
9dan

Reputation: 4282

Probability of + character occurrence in base64 encoded string

I have over 6 million DB records containing base64 encoded string values.
These are SHA-256 output of random 13 digit numbers.
When I counted with SQL LIKE query, it is over 3 million.

I want to know whether is it normal.
So I tried to calculate the probability of + character occurrence.

Could you confirm this calculation?

(64^44 - 63^44) / 64^44

(Base64 encoding consist of 64 characters)

Wolfram Alpha says, its 0.5

Upvotes: 0

Views: 718

Answers (1)

meowgoesthedog
meowgoesthedog

Reputation: 15035

  • Number of base-64 digits needed to represent an SHA-256 checksum = 256 / log2(64) = 42.6666... = 43
  • Probability of one character not being + = 63/64
  • Probability of all characters not being + = (63/64)^43
  • Therefore probability of at least one being + = 1 - (63/64)^43 = (64^43 - 63^43) / (64^43)

So your answer was almost correct - just assumed the wrong number of digits. The numerical value is still correct within reasonable error.

Upvotes: 3

Related Questions