Sarmad Tabassum
Sarmad Tabassum

Reputation: 38

Can changes in synonyms.txt file take effect without reindex?

We are using Sunspot-solr 4.0 when I update synonyms file it does not change anything in search. Do I really need to re-index after making changes in synonyms.txt or there is any other trick to update synonyms file that I am missing?

Upvotes: 0

Views: 1714

Answers (1)

MatsLindh
MatsLindh

Reputation: 52802

That depends on when you're expanding the synonyms. If you're expanding at query time, the updates will be visible without any reindexing, but if you're expanding at index time (which is the recommended way), you'll have to reindex to get the new synonyms included in the index.

The reasoning behind recommending expansion at index time compared to query time is described in the old wiki:

This is because there are two potential issues that can arrise at query time:

The Lucene QueryParser tokenizes on white space before giving any text to the Analyzer, so if a person searches for the words sea biscit the analyzer will be given the words "sea" and "biscit" seperately, and will not know that they match a synonym.

Phrase searching (ie: "sea biscit") will cause the QueryParser to pass the entire string to the analyzer, but if the SynonymFilter is configured to expand the synonyms, then when the QueryParser gets the resulting list of tokens back from the Analyzer, it will construct a MultiPhraseQuery that will not have the desired effect. This is because of the limited mechanism available for the Analyzer to indicate that two terms occupy the same position: there is no way to indicate that a "phrase" occupies the same position as a term. For our example the resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would not match the simple case of "seabiscuit" occuring in a document

Even when you aren't worried about multi-word synonyms, idf differences still make index time synonyms a good idea. Consider the following scenario:

An index with a "text" field, which at query time uses the SynonymFilter with the synonym TV, Televesion and expand="true" Many thousands of documents containing the term "text:TV" A few hundred documents containing the term "text:Television" A query for text:TV will expand into (text:TV text:Television) and the lower docFreq for text:Television will give the documents that match "Television" a much higher score then docs that match "TV" comparably -- which may be somewhat counter intuitive to the client. Index time expansion (or reduction) will result in the same idf for all documents regardless of which term the original text contained.

There's an really detailed explanation of what's actually happening behind the scenes available in Better synonym handling in Solr.

As long as you're aware of these issues and the trade-off, doing query time synonyms could work fine - but you'll have to test it against your queries and what you expect the results to be - and be aware of the pitfalls.

Upvotes: 3

Related Questions