adesh singh
adesh singh

Reputation: 1727

How to delete Documents from a Lucene Index using Term or QueryParser

I am trying to delete documents from Lucene Index. I want to delete only the specified file from lucene index .

My following program is deleting the index which can be searched using keyword analyzer but my required filename can be searched only using StandardAnalyzer . So is it any way to set standard analyzer in my term or instead of term how can i user QueryParser to delete the Documents from lucene index.

 try{
    File INDEX_DIR= new File("D:\\merge lucene\\abc\\");

    Directory directory = FSDirectory.open(INDEX_DIR);

     IndexReader indexReader = IndexReader.open(directory,false);
     Term term= new Term("path","fileindex23005.htm");
    int l=   indexReader.deleteDocuments(term);
                      indexReader.close();

    System.out.println("documents deleted");
  }
  catch(Exception x){x.printStackTrace();}

Upvotes: 5

Views: 16907

Answers (3)

Gerhard Powell
Gerhard Powell

Reputation: 6175

As @dillippattnaik pointed out, multiple terms result in OR. I have updated his code to make it AND using BooleanQuery:

BooleanQuery query = new BooleanQuery
{
   { new TermQuery( new Term( "year", "2016" ) ), Occur.MUST },
   { new TermQuery( new Term( "STATE", "TX" ) ), Occur.MUST },
   { new TermQuery( new Term( "CITY", "CITY NAME" ) ), Occur.MUST }
};

indexWriter.DeleteDocuments( query );

Upvotes: 0

dillip
dillip

Reputation: 1842

Adding for future reference for someone like me, where delete documents is on indexWriter , you may use

indexWriter.deleteDocuments(Term... terms)

instead of using deleteDocuments(query) method; to have less hassle if you have to match only one field. Be-aware that this method treats terms as OR condition if multiple terms are passed. So it will match any term and will delete all records. The code below will match state=Tx in documents stored and will delete matching records.

  indexWriter.deleteDocuments(
        new Term("STATE", "Tx")
      );

For combining different fields with AND condition, we may use following code:

 BooleanQuery.Builder builder = new BooleanQuery.Builder();

//note year is stored as int , not as string when document is craeted.
//if you use Term here which will need 2016 as String, that will not match with documents stored with year as int.
 Query yearQuery = IntPoint.newExactQuery("year", 2016);
 Query stateQuery = new TermQuery(new Term("STATE", "TX"));
 Query cityQuery = new TermQuery(new Term("CITY", "CITY NAME"));

 builder.add(yearQuery, BooleanClause.Occur.MUST);
 builder.add(stateQuery, BooleanClause.Occur.MUST);
 builder.add(cityQuery, BooleanClause.Occur.MUST);

 indexWriter.deleteDocuments(builder.build());

Upvotes: 2

femtoRgon
femtoRgon

Reputation: 33341

I assume you are using Lucene 3.6 or before, otherwise IndexReader.deleteDocuments no longer exists. You should, however, be using IndexWriter instead, anyway.

If you can only find the document using query parser, then just run a normal query, then iterate through the documents returned, and delete them by docnum, along the lines of:

Query query = queryParser.parse("My Query!");
ScoreDoc[] docs = searcher.search(query, 100).scoreDocs;
For (ScoreDoc doc : docs) {
    indexReader.deleteDocument(doc.doc);
}

Or better yet (simpler, uses non-defunct, non-deprecated functionality), just use an IndexWriter, and pass it the query directly:

Query query = queryParser.parse("My Query!");
writer.deleteDocuments(query);

Upvotes: 14

Related Questions