George
George

Reputation: 8378

Lucene: delete from index, based on multiple fields

I need to perform deletion of the document from lucene search index. Standard approach :

indexReader.deleteDocuments(new Term("field_name", "field value"));

Won't do the trick: I need to perform the deletion based on multiple fields. I need something like this:

(pseudo code)
TermAggregator terms = new TermAggregator();
terms.add(new Term("field_name1", "field value 1"));
terms.add(new Term("field_name2", "field value 2"));
indexReader.deleteDocuments(terms.toTerm());

Is there any constructs for that?

Upvotes: 2

Views: 1784

Answers (2)

Luke Machowski
Luke Machowski

Reputation: 4211

Choice of Analyzer

First of all, watch out which analyzer you are using. I was stumped for a while only to realise that the StandardAnalyzer filters out common words like 'the' and 'a'. This is a problem when your field has the value 'A'. You might want to consider the KeywordAnalyzer:

See this post around the analyzer.

// Create an analyzer:
// NOTE: We want the keyword analyzer so that it doesn't strip or alter any terms:
// In our example, the Standard Analyzer removes the term 'A' because it is a common English word.
// https://stackoverflow.com/a/9071806/231860
KeywordAnalyzer analyzer = new KeywordAnalyzer();

Query Parser

Next, you can either create your query using the QueryParser:

See this post around overriding the default operator.

// Create a query parser without a default field in this example (the first argument):
QueryParser queryParser = new QueryParser("", analyzer);

// Optionally, set the default operator to be AND (we leave it the default OR):
// https://stackoverflow.com/a/9084178/231860
// queryParser.setDefaultOperator(QueryParser.Operator.AND);

// Parse the query:
Query multiTermQuery = queryParser.parse("field_name1:\"field value 1\" AND field_name2:\"field value 2\"");

Query API

Or you can achieve the same by constructing the query yourself using their API:

See this tutorial around creating the BooleanQuery.

BooleanQuery multiTermQuery = new BooleanQuery();
multiTermQuery.add(new TermQuery(new Term("field_name1", "field value 1")), BooleanClause.Occur.MUST);
multiTermQuery.add(new TermQuery(new Term("field_name2", "field value 2")), BooleanClause.Occur.MUST);

Numeric Field Queries (Int etc...)

When the key fields are numeric, you can't use a TermQuery, but instead must use a NumericRangeQuery.

See the answer to this question.

// NOTE: For IntFields, we need NumericRangeQueries:
// https://stackoverflow.com/a/14076439/231860
BooleanQuery multiTermQuery = new BooleanQuery();
multiTermQuery.add(NumericRangeQuery.newIntRange("field_name1", 1, 1, true, true), BooleanClause.Occur.MUST);
multiTermQuery.add(NumericRangeQuery.newIntRange("field_name2", 2, 2, true, true), BooleanClause.Occur.MUST);

Delete the Documents that Match the Query

Then we finally pass the query to the writer to delete documents that match the query:

See the answer to this question.

// Remove the document by using a multi key query:
// http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
writer.deleteDocuments(multiTermQuery);

Upvotes: 1

Avi
Avi

Reputation: 20152

IndexWriter has methods that allow more powerful deleting, such as IndexWriter.deleteDocuments(Query). You can build a BooleanQuery with the conjunction of terms you wish to delete, and use that.

Upvotes: 2

Related Questions