Reputation: 741
I need to delete a large number of documents in a 5.5 Elasticsearch cluster. I know the optimal way to do this is to rebuild the cluster without the intended documents, but that's not possible in our case. I run the following query that deletes documents from a subset of the indexes in the cluster:
GET myindex_1*/doc_type/_delete_by_query
{
"query": {
"bool": {
"filter": [
{
"terms": {
"typeCode": [
"Filtered_Type"
]
}
}
],
"must": [
{
"range": {
"createdDateUTC": {
"lt": "2017-10-28"
}
}
}
]
}
}
}
It starts deleting documents for a couple of hours but then just stops and I have to kick it off again. Any ideas why it stops running the delete query?
Just a note, I'm using Kibana to run the query and the request times out on the client side when though I can see it continues deleting on the backend.
Upvotes: 3
Views: 3437
Reputation: 2162
The Delete by Query API can halt if it runs into conflicting versions of a document. This can happen if a document was updated after the delete by query started but before it reached the document (Elastic documentation).
If you're running the deletion asynchronously, you can fetch the task details after it completes to see if there were any failures (Task API docs).
You can also specify the conflicts=proceed
query parameter which will not halt the deletion if a conflict is detected. I'm not sure if that conflicting doc will still be deleted though.
Upvotes: 0
Reputation: 15363
From here:
By default _delete_by_query uses scroll batches of 1000. You can change the batch size with the scroll_size URL parameter:
POST twitter/_delete_by_query?scroll_size=5000
{
"query": {
"term": {
"user": "kimchy"
}
}
}
You can find more information here about batching and batch sizes here:
And since you'll need to scroll through one to many batches to delete all of the documents found by your query, you can find more information about scrolling here:
Upvotes: 1