Reputation: 41

How to override stopwords in lucene

I am creating a Lucene index in my folder and I'm indexing the content of txt files. I want my content without stopwords indexing, but after passing through the analyzer actually put me off the stopwords when searching, but I all text indexes. I put the code below:

    IndexWriter writer = new IndexWriter(new SimpleFSDirectory(indexDir),
                        new SpanishAnalyzer(Version.LUCENE_36),
                        create,
                        IndexWriter.MaxFieldLength.UNLIMITED);
    if (!file.isHidden() && file.exists() && file.canRead()) {


                String fileName = file.getName();
                String type = Files.extension(file);
                if(type==null)
                {
                    type="";
                }
                Document d = new Document();

                d.add(new Field("Name",fileName,
                                Store.YES,Index.ANALYZED,Field.TermVector.YES));
                d.add(new Field("Type",type,
                                Store.YES,Index.ANALYZED));
                if(("txt".equals(type.toLowerCase())) || ("log".equals(type.toLowerCase())))
                {
                    String Content = Files.readFromFile(file,"ASCII");
        d.add(new Field("Content",Content,Store.YES,Index.ANALYZED, Field.TermVector.YES));
                }
    }

    writer.addDocument(d);

The contents of a sample file is "of a to install a directory". If I perform a search for "a", "to", "of" I do not find anything, it means that I have successfully passed the analyzer. Using the tool to see the index LUKE, I see that the field contains "to install to a directory of", but seeing the look Field.TermVector containing: "install" and "directory" only, and that's all I want to appear in the field.

Thank you.

Upvotes: 0

Answers (2)

ankitjaininfo

Reputation: 12372

You are using a default constructor of SpanishAnalyzer(). You should use the one with stop words as argument.

Create your indexer as below:

IndexWriter writer = new IndexWriter(new SimpleFSDirectory(indexDir),
                    new SpanishAnalyzer(Version.LUCENE_36, new HashSet<String>()),
                    create,
                    IndexWriter.MaxFieldLength.UNLIMITED);

Here we are passing an empty set of stop words, hence overriding defaults with no stopwords. You should read more about lucene stop-words here.

Upvotes: 2

mindas

Reputation: 26733

Try using a different constructor for SpanishAnalyzer: instead of

new SpanishAnalyzer(Version.LUCENE_36)
use
new SpanishAnalyzer(Version.LUCENE_36, Collections.emptySet())

Upvotes: 1

How to override stopwords in lucene

Answers (2)

Related Questions