Rusty Rob
Rusty Rob

Reputation: 17173

Most efficient way to delete ndb query results

Here is my current method:

def delete_up_to_10000(query):
    for i in range(10):
        keys = query.fetch(1000, keys_only=True, deadline=40, batch_size=1000)
        ndb.delete_multi(keys)

My question is, Is it possible to delete the results of the query without actually having to fetch the keys? Shouldn't that be possible?

Here are a few decision points around my current solution:

Upvotes: 1

Views: 604

Answers (3)

Rusty Rob
Rusty Rob

Reputation: 17173

Here's my current solution now:

def _delete_from_query(query, limit, batch_size=2000):
    delete_count = 0
    next_curs = None
    while True:
        lim = min(batch_size, limit - delete_count)
        keys, next_curs, more = query.fetch_page(
            lim, start_cursor=next_curs, deadline=40, batch_size=lim, keys_only=True
        )
        ndb.delete_multi(keys)
        delete_count += len(keys)
        if not keys or not more or delete_count == limit:
            break
    return delete_count

Upvotes: 1

Andrei Volgin
Andrei Volgin

Reputation: 41089

The keys-only query does not retrieve the entities. It looks at the indexes, but only the indexes that you specified in the query.

"Delete" operation, on the other hand, must delete not only the entity itself, it must also delete an entry into each and every index for this entity - whether it's an index for a property or a composite index.

Thus, a query simply does not have all the information necessary to perform a delete operation at the same time. And the hypothetical "delete what you find" operation will be just a shorthand for "find a list of keys, then use these keys to update all indexes and remove an entity itself"." It may remove some overhead, but at the cost of greater complexity.

Upvotes: 2

Ryan
Ryan

Reputation: 2542

You need to fetch the keys in order to do the delete. Are you trying to mass delete and are simply spreading it out? You should look into a mapper (ie mapreduce). Its perfect for going through large amounts of datastore entries and deleting. You could run the map job once a day / week to keep your data under control.

Upvotes: 2

Related Questions