Reputation: 325
In the game Hangman, is it the case that a greedy letter-frequency algorithm is equivalent to a best-chance-of-winning algorithm?
Is there ever a case where it's worth sacrificing preservation of your remaining lives, for the sake of a better chance of guessing the correct answer?
Further clarification of the problem:
Motivation: This question is inspired by the interesting discussion at http://www.datagenetics.com/blog/april12012/index.html
They suggest an algorithm for solving the word game "Hangman" optimally.
Their strategy can be summarised thus (edited for clarification):
At each stage, we are guessing the letter (not previously guessed) which occurs in the largest number of remaining possible words.
There is some motivation to like this algorithm - we are always minimally likely to lose a life.
But, it strikes me that this isn't necessarily the best solution: if we're trying to guess the word (within a certain number of lives), it's not necessarily always the case that the most frequently occurring letter is the most useful letter to distinguish between the remaining available words.
I'm not sure, though, as it does seem opportune to avoid losing a life wherever possible. Will it ever be the case that optimal strategy permits us to sacrifice a life for a better opportunity to win?
Question: is it the case that this greedy algorithm is equivalent to a best-chance-of-winning algorithm, or not? And can you prove it?
An example dictionary+game would be ideal to show a disproof.
Upvotes: 21
Views: 32898
Reputation: 130
The strategy of choosing the most frequent letter in the pool of possible words maximizes the probability of guessing a letter correctly on that specific turn. However, this strategy does not maximize the overall winning probability. @Reinstate Monica's answer demonstrates this fact (which applies for much more than just the hangman game).
For example, consider the following two options: play 7 games with 90% winning probability each, or play 1 game with 50% probability. At first, it seems that maximizing the winning probability is the optimal solution, but 0.9⁷<0.5 (because 0.9⁷ = 47.8%).
Nevertheless, it still is a really good algorithm in terms of speed, simplicity and winning rate. I coded it for Portuguese (my native language) and it seems to guarantee a 97% winning probability within the standard 5 mistakes or less.
Upvotes: 0
Reputation: 1
An English example demonstrating this is not optimal; suppose the pattern is KE??, you have eliminated all possibilities except KELL, KELP, KEMB, KEMP, KEWL, and you have one life left. Then although L is the greedy choice, it could result in a loss, whereas P is a guaranteed win.
Upvotes: -1
Reputation: 58
The answer clearly shows why the greedy algorithm is not the best, but doesn't answer how much better we can get if we stray from the greedy path.
If we assume all words are equally likely in case you are playing against a computer opponent. The case of 4 letters, 6 lives case if you have option to look simply the second most popular letter your probability of winning increases from 55.2% to 58.2%, if you are willing to check one more letter then it increases to 59.1%.
Code: https://gist.github.com/anitasv/c9b7cedba324ec852e470c3011187dfc
A full analysis using ENABLE1 (scrabble dictionary which has 172820 words) with 2 to 6 letters, and with 0 to 10 lives, and 1-greedy to 4-greedy gives the following results. Of course at 25 lives every strategy is equivalent with 100% win rate, so not going beyond 10 lives. Going more than 4-greedy was improving probability only slightly.
letters, lives, 1-greedy, 2-greedy, 3-greedy, 4-greedy
2 letters 0 lives 3.1% 3.1% 3.1% 3.1%
2 letters 1 lives 7.2% 7.2% 7.2% 8.3%
2 letters 2 lives 13.5% 13.5% 13.5% 14.5%
2 letters 3 lives 21.8% 21.8% 22.9% 22.9%
2 letters 4 lives 32.2% 33.3% 33.3% 33.3%
2 letters 5 lives 45.8% 45.8% 45.8% 45.8%
2 letters 6 lives 57.2% 57.2% 57.2% 57.2%
2 letters 7 lives 67.7% 67.7% 67.7% 67.7%
2 letters 8 lives 76% 76% 76% 76%
2 letters 9 lives 84.3% 84.3% 84.3% 84.3%
2 letters 10 lives 90.6% 91.6% 91.6% 91.6%
3 letters 0 lives 0.9% 1.1% 1.1% 1.1%
3 letters 1 lives 3.4% 3.8% 3.9% 3.9%
3 letters 2 lives 7.6% 8.4% 8.6% 8.8%
3 letters 3 lives 13.7% 15% 15.1% 15.2%
3 letters 4 lives 21.6% 22.8% 23.3% 23.5%
3 letters 5 lives 30.3% 32.3% 32.8% 32.8%
3 letters 6 lives 40.5% 42% 42.3% 42.5%
3 letters 7 lives 50.2% 51.4% 51.8% 51.9%
3 letters 8 lives 59.6% 60.9% 61.1% 61.3%
3 letters 9 lives 68.7% 69.8% 70.4% 70.5%
3 letters 10 lives 77% 78.3% 78.9% 79.2%
4 letters 0 lives 0.8% 1% 1.1% 1.1%
4 letters 1 lives 3.7% 4.3% 4.4% 4.5%
4 letters 2 lives 9.1% 10.2% 10.6% 10.7%
4 letters 3 lives 18% 19.4% 20.1% 20.3%
4 letters 4 lives 29.6% 31.3% 32.1% 32.3%
4 letters 5 lives 42.2% 44.8% 45.6% 45.7%
4 letters 6 lives 55.2% 58.2% 59.1% 59.2%
4 letters 7 lives 68% 70.4% 71.1% 71.2%
4 letters 8 lives 78% 80.2% 81% 81.1%
4 letters 9 lives 85.9% 87.8% 88.4% 88.7%
4 letters 10 lives 92.1% 93.3% 93.8% 93.9%
5 letters 0 lives 1.5% 1.8% 1.9% 1.9%
5 letters 1 lives 6.1% 7.5% 7.9% 8%
5 letters 2 lives 15.9% 18.3% 18.9% 19.2%
5 letters 3 lives 30.1% 34.1% 34.8% 34.9%
5 letters 4 lives 47.7% 51.5% 52.3% 52.5%
5 letters 5 lives 64.3% 67.4% 68.3% 68.5%
5 letters 6 lives 77.6% 80.2% 80.6% 80.8%
5 letters 7 lives 86.9% 88.6% 89.2% 89.4%
5 letters 8 lives 92.8% 94.1% 94.4% 94.5%
5 letters 9 lives 96.4% 97.1% 97.3% 97.3%
5 letters 10 lives 98.2% 98.6% 98.8% 98.8%
6 letters 0 lives 3.2% 3.7% 3.9% 3.9%
6 letters 1 lives 12.6% 14.3% 14.9% 15%
6 letters 2 lives 29.2% 32.2% 32.8% 33%
6 letters 3 lives 50.1% 53.4% 54.2% 54.4%
6 letters 4 lives 69.2% 72.4% 73.1% 73.2%
6 letters 5 lives 83.1% 85.5% 85.9% 86.1%
6 letters 6 lives 91.5% 92.9% 93.2% 93.2%
6 letters 7 lives 95.8% 96.5% 96.7% 96.8%
6 letters 8 lives 97.9% 98.3% 98.4% 98.5%
...
Upvotes: 0
Reputation: 33931
I have written a script that solves hangman optimally [github].
My basic strategy is this:
[^est][^est]e[^est][^est]
Upvotes: 4
Reputation: 91142
There are some critical assumptions you have to make as to what a game of "Hangman" is.
One thing to remember is that if you pick a correct letter, you do not lose a guess.
I will provide a solution for the one-word-&-equally-likely case. The two-word case can be generalized by creating a new dictionary equal to the cartesian product of your current dictionary, etc. The more-likely-than-others case can be generalized with a bit of probability.
Before we define our algorithm, we define the concept of a reduction. If we were to guess letters L1,L2,L3,...LN all at ONCE, then we would reduce the dictionary to a smaller dictionary: some words would be eliminated, and additionally some letters may also be eliminated. For example if we had the dictionary {dog, cat, rat}
and we guessed a
, we would eliminate {d,g} if the guess was true, or also eliminate {c,r,t} if it was false.
The optimal algorithm is as follows:
Of course that is how you solve any game, and for the most part it is intractable due to the exponential size requirements. You cannot get optimal unless you perfectly replicate this, and I seriously doubt that a strategy which doesn't "look" two or more moves ahead can hope to replicate this. You can however attempt to approximate the optimal strategy as follows.
Repeat the following at each step:
frac words with L
#words without L
+ frac words without L
#words with L
)/(# words without L
/# words total
)... note that this may be infinite if all the words have a certain letter, in which case go ahead and guess it since there is no penalty.Of course if your dictionary has more than 2^[max number of guesses]
entries, the "Hangman" game is mostly impossible in the equal-probability world (unless the dictionary is highly constrained), so you have to work in the unequal-probability world. In this world, rather than maximizing the amount of elimination you do, you maximize the "expected surprisal" (also called entropy). Each word you associate a prior probability (e.g. let's say there is a 0.00001 chance of the word being 'putrescent' and a 0.002 chance of the word being 'hangman'). The surprisal is equal to the chance, measured in bits (the negative log of the chance). An answer to a guess will either yield no letters, a single letter, or more than one letter (many possibilities). Thus:
{A__, _A_, __A, AA_, A_A, _AA, AAA}
. For each outcome, calculate the probability using Bayes's rule, and also the new possible dictionaries (e.g. in one case, you'd have a dictionary of _A_:{cAt, bAt, rAt, ...}
and A__:{Art, Ark, Arm, ...}
etc.). Each of these new dictionaries also has a likelihood ratio, of the form size(postOutcomeDictionary dictionary)/size(preOutcomeDictionary)
; the negative log of this ratio is the amount of information (in bits) which the choice conveys to you.bitsGainedFromOutcome[i] = -log(size(postOutcomeDictionary)/size(preOutcomeDictionary))
. We take the weighted sum of these: sum{i}( prob(outcome[i])*bitsGainedFromOutcome[i] )
, then divide by the probability we are wrong: prob(outcome=='___')
.sum{i}( prob(outcome[i])*bitsGainedFromOutcome[i] )/prob(outcome=='___')
; in case this is infinity, there is nothing to lose, and we automatically pick it.So to answer your question:
>In the game Hangman, is it the case that a greedy letter-frequency algorithm is equivalent to a best-chance-of-winning algorithm?
Clearly not: if the dictionary was
cATs
bATs
rATs
vATs
sATe
mole
vole
role
your algorithm would guess a
or t
, which have a 5/8 chance of reducing your dictionary to 5/8 size for free, and a 3/8 chance of reducing your dictionary to 3/8 size for a cost of 1. You want to choose letters which reveal the most information. In this case, you should guess S, because it has a 4/8 chance of reducing your dictionary to 4/8 size for free, a 1/8 chance of reducing your dictionary to 1/8 size for free, and a 3/8 chance of reducing your dictionary to 3/8 size for a cost of 1. This is strictly better.
edit: I wanted to use an English dictionary example (above) to demonstrate how this is not artificial, and assumed that people could extrapolate from the example without being hung up on the non-strict equality. However, here is an unambiguously clear counterexample: You have 2000 words. 1000 words contain the letter A
in the first place. The other 1000 words contain a unique combination of B
s embedded elsewhere. For example, ?B???
, ??B??
, ???B?
, ????B
, ?BB??
, ?B?B?
, etc. The ?
s represent randomly-chosen characters. There are no A
s in the first ?, except for one word (whose ? is an 'A'), so that the frequency of A
s is strictly greater than the frequency of B
s. The proposed algorithm would guess A
which would result in {50%: choices_halved, 50%: choices_halved & lose_one_life}, whereas this algorithm would dictate the choice B
which results in {50%: YOU_WIN, 50%: choices_halved & lose_one_life}. Percentages have been rounded very slightly. (And no, a word with double letters does not contribute twice to the 'frequency', but even if it did under an insane definition, you could trivially modify this example by making the words begin with AAA...A
.)
(regarding comments: It is unreasonable to complain about strict equality in this example, e.g. "999/2000", since you can make the probabilities arbitrarily close to each other.)
(Which points out an amusing side-note: If the dictionary is large enough to make hangman impossible sometimes, a strategy ought to throw away guesses that it does not expect to be able to guess. For example if it only has 2 moves left, it ought to make the highest-probability assumption it can which eliminates subtrees with more than 2-moves worth of bits of surprise.)
Upvotes: 8
Reputation: 4723
Assume the following dictionary: ABC ABD AEF EGH
. (I'll capitalize unguessed letters.)
Assume you have only 1 life (makes the proof so much easier...).
The algorithm proposed above:
Starting with A
, you lose (1/4) or are left with aBC aBD aEF
(3/4).
Now guess B
and lose (1/3) or are left with abC abD
(2/3).
Now guess C
or D
and you lose (1/2) or win (1/2).
Probability to win: 3/4 * 2/3 * 1/2 = 1/4.
Now try something else:
Starting with E
, you lose (1/2) or are left with AeF eGH
(1/2).
Now you know what to guess and win.
Probability to win: 1/2.
Clearly the proposed algorithm is not optimal.
Upvotes: 11
Reputation: 5114
About your idea to try and guess the word instead of trying to guess letters, there sure may be some isolated cases where you guess the word from the first try or something like that, but this doesn't make that algorithm better on the average case. It's about expected probability.
Some improvement that could be done to that algorithm (in the version posted by Lajos) is some more informed pick of letter to be tried.
This could be achieved by adding one more weight: consider the usage of each word the vocabulary of the language.
For example, technical, medical, juridical etc. words would have much lower chances.
Take this dictionary (with some example usage weights):
frost 6
guilt 5
genes 1
fever 2
meter 1
Here e
is the most frequent letter, so it would get chosen. This would mean leaving only the medical terms, which are very unlikely. But if the decision is taken by:
weight(letter) = w * frequency +
(1 - w) * sum( usage_weight(word) where letter in word )
then, most probably t
would be chosen.
Because (let's say w = 0.2
):
weight(e) = 0.2 * 3 + 0.8 * (1 + 2 + 1)
= 3
weight(r) = 0.2 * 3 + 0.8 * (1 + 2 + 6)
= 7.8
weight(t) = 0.2 * 3 + 0.8 * (1 + 5 + 6)
= 10.2
Note: We should, of course use normalized weights, like frequency / num_words
to get accurate weight measuring.
Note: This algorithm can and should be adapted to the opponent:
Upvotes: 3
Reputation: 76968
No, this greedy algorithm is not the best at all and I can prove it by giving a better solution:
In each step we know the number of letters and we know some letters. We choose all the words from our set of words which have the given letters at the given positions and their length match the length of the word in question. We select the most frequent letter from the selected subset of words and guess about the given letter. For every guesses the guessed letter will be marked as guessed and they won't be guessed again in the future. This way you have much better chances of survival than in the greedy algorithm described in your question.
EDIT:
After the question was clarified and further specifications were made, I've come to the conclusion that the algorithm is optimal.
If we have n words with the given length, containing all the right guesses ("good letters") and not containing any wrong guesses ("bad letters"), x lives, we can look at the frequency of letters in the still possible words and select the letter with the biggest frequency, let's suppose that y words contained the letter.
In this case, the confidence rate of this guess is y/n, which is bigger than the confidence rate of any other letters, which gives a higher chance of survival. So, such a step is optimal. If you make a series of steps containing only steps in this spirit, the series of steps will be optimal too, so this algorithm is optimal.
But, if we have any extra information (like a description of the word, or knowing that the probability of having the same word twice in a short period), this algorithm can be enhanced based on the extra information, all the words in the set of words will have a fitness value (probability based on the extra information) and all the letter types in the words will be weighted based on the fitness score.
Upvotes: 0
Reputation: 906
Choose a letter that divides the remaining valid words in 2 sets of nearly equal size. With positional information there could be more than 2 sets. Repeat until your set has size 1. That is the best way. The proof is left as an exercise.
Upvotes: -1