Anuar Sharafudinov
Anuar Sharafudinov

Reputation: 109

Lucene Sample Query

When I search by phrase "ph1 ph2" it finds texts that contains "ph1" or "ph2".

String line = "ph1 ph2";           
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
Query query = parser.parse(line);  

Anybody knows how to search by 1) phrase ("ph1 ph2"). Example: This is sentence ph1 ph2. 2) phrase with maximum distance("ph1 ph2 ~3"). Example This ph1 is sentence ph2.

P.S I used standard Lucene Indexer to index my files. If this example is not clear view http://www.lucenetutorial.com/lucene-query-syntax.html

Here's full code:

String index = "C:/programs/lucenedemo/index";
    String field = "contents";                    
    IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
    //QueryParser parser = new QueryParser(Version.LUCENE_40, field, analyzer);          
    String line = "ph1 ph2";           
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
    Query query = parser.parse(line);                     
    //doPagingSearch(searcher, query, hitsPerPage, raw, queries == null && queryString == null);         
    //doPagingSearch

    TopDocs results = searcher.search(query, 300000);
    ScoreDoc[] hits = results.scoreDocs;        
    System.out.println(results.totalHits);

    for (int i=0;i<10;i++) {    
    Document doc = searcher.doc(hits[i].doc);
        String path = doc.get("path");
        if (path != null) System.out.println((i+1) + ". " + path);                          
    } 

    //end of doPagingSearch
    reader.close();

Upvotes: 1

Views: 8138

Answers (2)

femtoRgon
femtoRgon

Reputation: 33341

I'm not clear on exactly what you are looking for, but I believe it's one of:

  • "field:\"" + line + "\"" : Simple phrase query. Find the two adjacent ordered terms

  • "field:\"" + line + "\"~3" : Phrase query with slop. In order, but with up to three terms worth of separation in the two terms.

  • "field:(" + line + ")" : Not a phrase query at all. Simple search for the two terms. Any order or distance is acceptable.

You can see further options on query parser syntax in Lucene's query syntax documentation

Upvotes: 1

yclevine
yclevine

Reputation: 990

You may want to use a SpanQuery.

Specifically, you can create a SpanNearQuey, passing the constructor an array of SpanTermQuerys, one for each clause in the phrase, and an int representing the "slop", or maximum distance (as well as a boolean indicating whether the terms must be in order).

To search, use the getSpans method on the query that you have created.

Note that this will give you a list of all such occurrences, and not a list of matching documents. Depending on how you would like to present the results, you may need to iterate over the spans and group them according to document, etc.

Upvotes: 1

Related Questions