Markus Foss
Markus Foss

Reputation: 445

Solr Fuzzy search in multiValued field with max distance between terms

Hello stackOverflowers

I have a field in a Solr document collection with a field called names_txt - this is a multiValue="true" field.

This field contains all the names of the associated persons to a document

I want to be able to both do a fuzzy search and at the same time limit the number of terms between the to matching terms.

The query

names_txt:("markus foss"~2)

Will return all documents where you find the terms markus and foss where theres max 2 terms between them.

But when i search in a fuzzy way AND want to also specify the max number of terms between the matches, I cant get the syntax right.

The query:

names_txt:(markus~0.7 foss~0.7)

This does work, but returns false postives, since it will return a document with "markus something" in one value, and "foss somethingElse" in another.

What I would like to write is:

(markus~0.7 foss~0.7)~2

Anyone out there have a solution for my problem?

Upvotes: 3

Views: 1320

Answers (2)

FrKunze
FrKunze

Reputation: 300

Since in one single query term Solr can either process a word distance restraint or a fuzzy search restraint, we will need two terms for this:

names_txt:("markus foss"~2) AND names_txt:(markus~0.7 foss~0.7)

Note that quantifying fuzzyness by a float number is deprecated. Internally, lucene converts converts the float number to an int between 0 and 2 anyway, so we should use this integer (Damereau Levenshtein) edit distance right from the beginning in our search terms. So my final proposal states:

names_txt:("markus foss"~2) AND names_txt:(markus~1 foss~1)

(For those who are interested: The deprecated, somewhat quirky function that converts the similarity float to an edit distance int can be found at the end of this code file.)

Upvotes: 2

Persimmonium
Persimmonium

Reputation: 15791

I think you could do that using SpanQuery The issue is that the usual query parsers in Solr dont support them. Look at this article that mentions those that support spans: Surround, Xml-Query-Parser and Qsol. But check the status of each in current solr version.

Upvotes: 0

Related Questions