Reputation: 49
I'm attempting perform a bulk delete of documents whose id's are derived from a previous search. The query to determine the documents that are candidates for deletion is producing desired results (thousands of records) however the bulk delete only deletes 10 records at a time, even though I'm feeding it all of the results of the original query;
Client client = node.client();
BulkRequestBuilder bulkRequest = client.prepareBulk();
SearchResponse deletes = client.prepareSearch("my_index")
.setTypes("my_doc_type")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(boolQuery().mustNot(termQuery("tId", transactionId)))
.execute()
.actionGet();
long deleteHits = deletes.getHits().getTotalHits();
if (deleteHits > 0) {
logger.info("Preparing to delete (" + deleteHits + ") " +
"documents from index");
Arrays.asList(deletes.getHits().getHits()).stream().forEach(h ->
bulkRequest.add(client.prepareDelete()
.setIndex("my_index")
.setType("my_doc_type")
.setId(h.getId())));
}
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
throw new RuntimeException(bulkResponse.buildFailureMessage());
}
}
Upvotes: 1
Views: 3971
Reputation: 30163
By default, the search response returns only top 10 results. So, while deletes .getHits().getTotalHits()
can be in thousands or even in millions, the size of deletes.getHits().getHits()
will never be more than you specified in the size
parameter of your request, which 10 by default.
A naive approach would be to try paginating throw the results using normal search by changing the from
parameter. However, this can lead to missing to delete some records since each command will execute a new search and the result of the next search can get shifted comparing to the previous search as a result of deleting records on the previous search.
A proper approach is to use specialized scan and scroll search to paginate throw the records. This type of search will keep the results consistent between calls. An example, of this approach can be found in the delete by query plugin that will be available in v2.0.
I also need to note that while the delete by query functionality exists in the previous versions of elasticsearch and it might seem to be the easiest solution for your problem, I would still recommend to use scan/scroll because of poor performance and fragility of existing delete by query API implementation in pre-v2.0.
Upvotes: 2
Reputation: 1823
deletes.getHits().getTotalHits
give you the total number of hits for the search but SearchResponse deletes
do not contains all the results.
You'll need to paginate over it.
you'll need to use something like this to define the paging
client.prepareSearch("my_index").setFrom(int from).setSize(int pageSize);
Upvotes: 0