ar099968
ar099968

Reputation: 7537

Elasticsearch: delete by query is really slow on a lot of documents to delete

i'm using delete by query plugins for elastic search.

I have a index products with a integer field size. I want delete all document with size 10. I have over 5000 documents with size 10. If i try:

DELETE /products/product/_query?q=size:10

this query requires over 2 minutes.

I understand because delete by query plugin is slow, from documentation:

Internally, it uses Scroll and Bulk APIs to delete documents in an efficient and safe manner. It is slower [..] Queries which match large numbers of documents may run for a long time, as every document has to be deleted individually.

How do i perform a fastest documents mass deleting?

Upvotes: 8

Views: 7513

Answers (2)

mike rodent
mike rodent

Reputation: 15642

ES 8.11, 2024-01

I don't know what the situation was in 2016, but maybe you could consider doing a bulk delete.

The downside of this is that it might be quite complicated to determine the _ids of all the LuceneDocuments (index documents) you need to delete. Typically you might have to run a _search query to find these _ids on the basis of your query. You must have these _ids to do a bulk delete.

Then you have the faff of making a bulk string conforming to the strict string format required. It's fairly feasible when you get the hang of it. And these bulk operations are pretty fast.

Upvotes: 0

bittusarkar
bittusarkar

Reputation: 6357

You can't. This is the only supported way of deleting documents in latest versions of Elasticsearch. Elasticsearch 1.x deletes much faster (but potentially in an unsafe manner). So if it is really worth so much, you can go back to an older version of Elasticsearch.

Upvotes: 6

Related Questions