Smoki
Smoki

Reputation: 561

Solr search query : Given word with numbers in neighborhood

I just found out, that solr server can find words which are in a given distance to another word like this:

text_original : "word1 word2"~10

So solr is searching for word1 which has a word2 in a maximal distance of 10 words around.

great, YAY

but now I just want to do the same just with some undifined numbers. I just want to have a look for numbers which occure in a given range os some keywords. As a regex I would write something like that:

myWord(\s)+(([A-Za-z]+)\s){0,10}([0-9]{3,12}(\.|\,)[0-9]{1,4})

or something like that.

So I thought it would be easy in solr to do it similar to words in a range:

text_original: Word1 /[0-9]{3,12}/~10

But yes, the both terms are now linked with OR, so I find numbers OR my given word. But i can't use quotation because the regex won't work then.

Can anyone please leave me a hint in which constellation this search terms have to be, that it works like described?

Upvotes: 0

Views: 414

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

You can do this through the ComplexPhraseQueryParser, with a query like:

text_original:"Word1 /[0-9]{3,12}/"~10

Keep in mind, that a regex query in lucene must match the whole term, so this would not match "word1 word2", but it would match "word1 extra stuff 20". Slop also seemed a bit odd in my testing.

You could do it if you are willing to fall back on writing a raw lucene query, you can also accomplish it using the SpanQuery API, such as:

SpanQuery wordQuery = new SpanTermQuery(new Term("text_original", "Word1"));
SpanQuery numQuery = new SpanMultiTermQueryWrapper(new RegexpQuery("text_original", "[0-9]{3,12}"));
Query proxQuery = new SpanNearQuery(new SpanQuery[] {wordQuery, numQuery}, 10, false);
searcher.search(proxQuery, numHits);

Upvotes: 1

Related Questions