Reputation: 49107
I am reading the Lucene in Action book and I do not understand the multi-term phrases part.
The following text is indexed:
the quick brown fox jumped over the lazy dog
And then you add the following terms to the PhraseQuery
: quick jumped lazy with a slop equal 4. That results in a match, but I don't understand how that happens. How do you calculate the number of moves when there are multiple terms? I don't understand how they do it.
The same with the terms lazy jumped quick with slop equal 8.
Upvotes: 1
Views: 191
Reputation: 33351
The slop is actually an edit distance. Inserting extra terms in between them adds 1 to the distance, transposing terms adds 2 (the first edit moving the two terms atop one another).
You can go through the edits one at a time to illustrate:
quick jumped lazy
distance:0quick _ jumped lazy
distance:1quick _ _ jumped lazy
distance:2quick _ _ jumped _ lazy
distance:3quick _ _ jumped _ _ lazy
distance:4And for the second case:
lazy jumped quick
distance:0lazy/jumped quick
distance:1lazy/jumped/quick
distance:2 (all three terms superimposed, in the same position)quick lazy/jumped
distance:3quick jumped lazy
distance:4quick _ jumped lazy
distance:5quick _ _ jumped lazy
distance:6quick _ _ jumped _ lazy
distance:7quick _ _ jumped _ _ lazy
distance:8Upvotes: 2