Yaara Gazit
Yaara Gazit

Reputation: 75

How can I get the terms of a Lucene document field tokens after they are analyzed?

I'm using Lucene 5.1.0. After Analyzing and indexing a document, I would like to get a list of all the terms indexed that belong to this specific document.

{        
        File[] files = FILES_TO_INDEX_DIRECTORY.listFiles();
        for (File file : files) {
            Document document = new Document();
            Reader reader = new FileReader(file);
            document.add(new TextField("fieldname",reader));            
            iwriter.addDocument(document);
        }  

        iwriter.close();
        IndexReader indexReader = DirectoryReader.open(directory);
        int maxDoc=indexReader.maxDoc();
        for (int i=0; i < maxDoc; i++) {
            Document doc=indexReader.document(i);
            String[] terms = doc.getValues("fieldname");
        }
}

the terms return null. Is there a way to get the saved terms per document?

Upvotes: 1

Views: 1027

Answers (1)

Yaara Gazit
Yaara Gazit

Reputation: 75

Here is a sample code for the answer, using a TokenStream

 TokenStream ts= analyzer.tokenStream("myfield", reader);
            // The Analyzer class will construct the Tokenizer, TokenFilter(s), and CharFilter(s),
            //   and pass the resulting Reader to the Tokenizer.
            OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
            CharTermAttribute charTermAttribute = ts.addAttribute(CharTermAttribute.class);

            try {
                ts.reset(); // Resets this stream to the beginning. (Required)
                while (ts.incrementToken()) {
                    // Use AttributeSource.reflectAsString(boolean)
                    // for token stream debugging.
                    System.out.println("token: " + ts.reflectAsString(true));
                    String term = charTermAttribute.toString();
                    System.out.println(term);

                }
                ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
            } finally {
                ts.close(); // Release resources associated with this stream.
            }

Upvotes: 1

Related Questions