LuckyLuke
LuckyLuke

Reputation: 49107

Multi-term phrases in Lucene

I am reading the Lucene in Action book and I do not understand the multi-term phrases part.

The following text is indexed:

the quick brown fox jumped over the lazy dog

And then you add the following terms to the PhraseQuery: quick jumped lazy with a slop equal 4. That results in a match, but I don't understand how that happens. How do you calculate the number of moves when there are multiple terms? I don't understand how they do it.

The same with the terms lazy jumped quick with slop equal 8.

Upvotes: 1

Views: 191

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

The slop is actually an edit distance. Inserting extra terms in between them adds 1 to the distance, transposing terms adds 2 (the first edit moving the two terms atop one another).

You can go through the edits one at a time to illustrate:

  • quick jumped lazy distance:0
  • quick _ jumped lazy distance:1
  • quick _ _ jumped lazy distance:2
  • quick _ _ jumped _ lazy distance:3
  • quick _ _ jumped _ _ lazy distance:4

And for the second case:

  • lazy jumped quick distance:0
  • lazy/jumped quick distance:1
  • lazy/jumped/quick distance:2 (all three terms superimposed, in the same position)
  • quick lazy/jumped distance:3
  • quick jumped lazy distance:4
  • quick _ jumped lazy distance:5
  • quick _ _ jumped lazy distance:6
  • quick _ _ jumped _ lazy distance:7
  • quick _ _ jumped _ _ lazy distance:8

Upvotes: 2

Related Questions