Myster
Myster

Reputation: 18104

Can I clear the stopword list in lucene.net in order for exact matches to work better?

When dealing with exact matches I'm given a real world query like this:

"not in education, employment, or training"

Converted to a Lucene query with stopwords removed gives:

+Content:"? ? education employment ? training" 

Here's a more contrived example:

"there is no such thing"

Converted to a Lucene query with stopwords removed gives:

+Content:"? ? ? ? thing" 

My goal is to have searches like these match only the exact match as the user entered it.

Could one solution be to clear the stopwords list? would this have adverse affects? if so what? (my google-fu failed)

Upvotes: 0

Views: 665

Answers (1)

Shazwazza
Shazwazza

Reputation: 811

This all depends on the analyzer you are using. The StandardAnalyzer uses Stop words and strips them out, in fact the StopAnalyzer is where the StandardAnalyzer gets its stop words from.

Use the WhitespaceAnalyzer or create your own by inheriting from one that most closely suits your needs and modify it to be what you want.

Alternatively, if you like the StandardAnalyzer you can new one up with a custom stop word list:

//This is what the default stop word list is in case you want to use or filter this
var defaultStopWords = StopAnalyzer.ENGLISH_STOP_WORDS_SET;

//create a new StandardAnalyzer with custom stop words
var sa = new StandardAnalyzer(
    Version.LUCENE_29, //depends on your version
    new HashSet<string> //pass in your own stop word list
    {
        "hello",
        "world"
    });

Upvotes: 1

Related Questions