Counting number of Regex query matches in Document field

Question

Using Lucene, I can figure out how to create a document, put values in respected fields and then proceed to use a searcher to search the indexed document for matches.

However, I am now more concerned with the number of matches in a particular field of each document. Just knowing there is a match is fine but I would like to know how many times the pattern was found in the field.

Example.

Document doc = new Document();
doc.add(new Field("TNAME", "table_one", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("CNAME", "column_one", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("DATA", "This would be the data found in this particular field of a single document", Field.Store.NO, Field.Index.ANALYZED));

If I wanted to preform a document search querying the "DATA" field to figure out the number of times ^d.* pattern is met how would I do so? (giving the result of 2 for the above document).

user250117 · Accepted Answer

Simple Answer found:

IndexSearcher searcher = new IndexSearcher(directory);
    IndexReader reader = searcher.getIndexReader();
    RegexTermEnum regexTermEnum = new RegexTermEnum(reader, new Term(
            "field", "d.*"), new JavaUtilRegexCapabilities());

    do {
        System.out.println("Next:");
        System.out.println("	Doc Freq: " + regexTermEnum.docFreq());
        if (regexTermEnum.term() != null) {             
            System.out.println("	"+regexTermEnum.term());
            TermDocs td = reader.termDocs(regexTermEnum.term());
            while(td.next()){
                System.out.println("Found "+ td.freq()+" matches in document " + reader.document(td.doc()).get("name"));
            }
        }
    } while (regexTermEnum.next());
    System.out.println("End.");

Counting number of Regex query matches in Document field

Answers (1)

Related Questions