Reputation: 18547
Given a grammar list and string list
grammar list is a list sort by probability of occurrence that looks like (p is probability)
123W456 (p=0.9)
1%W3W456 (p=0.8)
...
where W
is string (alphabetical words),
W
is selected from the string list (sort by p), which looks like
Hello (P=0.9)
Hi (P=0.8)
...
Therefore the word 123Hello456
has probability (p=0.9*0.9)
and the word 1%Hi3Hello456
has the probability (p=0.8*0.8*0.9)
My problem is given a string say 1%Hi3Hi456
, I want to get the number of words that has a higher probability than 1%Hi3Hi456
(p=0.8*0.8*0.8)
.
What I am thinking now is the to do by brute force, generate of words in descending probabilities (e.g., 123Hello456
,...) until 1%Hi3Hi456
.
Is there a more efficient method?
Upvotes: 0
Views: 65
Reputation: 34839
It's given that the probability of 1%Hi3Hi456
is 0.8*0.8*0.8 = 0.512
, and we're looking for strings that have higher probability.
Starting at the beginning of the grammar list, we see that 123W456
has a probability of 0.9
. Using that information, we can compute the minimum probability of W
as
p = 0.512 / 0.9 = 0.56888
Which is to say that any word in the word list that has a probability greater than 0.56888
will work with the grammar 123W456
. If the word list is sorted by probability, then a binary search can be used to determine the number of acceptable words in O(logN) time.
The next grammar, 1%W3W456
, is a little more problematic. Dividing the grammar probability, we have
p = 0.512 / 0.8 = 0.64
but that leaves the problem that p1 * p2 > 0.64
where p1
and p2
are the probabilities of two words in the word list. One way to solve is to choose the first word with a linear search for words with p1 > 0.64
, and then do the binary search for words with
p2 > 0.64 / p1
Upvotes: 1