Reputation: 21
Using Lucene, I can figure out how to create a document, put values in respected fields and then proceed to use a searcher to search the indexed document for matches.
However, I am now more concerned with the number of matches in a particular field of each document. Just knowing there is a match is fine but I would like to know how many times the pattern was found in the field.
Example.
Document doc = new Document();
doc.add(new Field("TNAME", "table_one", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("CNAME", "column_one", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("DATA", "This would be the data found in this particular field of a single document", Field.Store.NO, Field.Index.ANALYZED));
If I wanted to preform a document search querying the "DATA" field to figure out the number of times ^d.* pattern is met how would I do so? (giving the result of 2 for the above document).
Upvotes: 1
Views: 327
Reputation: 21
Simple Answer found:
IndexSearcher searcher = new IndexSearcher(directory);
IndexReader reader = searcher.getIndexReader();
RegexTermEnum regexTermEnum = new RegexTermEnum(reader, new Term(
"field", "d.*"), new JavaUtilRegexCapabilities());
do {
System.out.println("Next:");
System.out.println("\tDoc Freq: " + regexTermEnum.docFreq());
if (regexTermEnum.term() != null) {
System.out.println("\t"+regexTermEnum.term());
TermDocs td = reader.termDocs(regexTermEnum.term());
while(td.next()){
System.out.println("Found "+ td.freq()+" matches in document " + reader.document(td.doc()).get("name"));
}
}
} while (regexTermEnum.next());
System.out.println("End.");
Upvotes: 1