How does the cursor work in ndb?

Question

I am trying to replicate this former db cursor code snippet into ndb. (Scroll down until Updating Existing Entities)

I have 2700 records. The idea behind this cursor is to go through all records and create new ones with a different key. I need a snapshot of the time when I start off, because I don't want the newly created records to show up in the cursor again.

In other words the cursor should only iterate over the 2700 initial records. I am trying to pass the cursor along, but it doesn't work. I get some 8500 records at the end. I was expecting 2700 x 2 = 5400. So it is somehow not working correctly.

BATCH_SIZE = 100  # ideal batch size may vary based on entity size.

def updateRecordSchema(cursor=None, num_updated=0):
    query = Record.query()
    records, cursor, more = query.fetch_page(BATCH_SIZE, start_cursor=cursor)

    to_put = []
    to_delete = []
    for record in records:
        new_record = Record(parent = record.user, 
                            user = record.user,
                            record_date = record.record_date)
        to_put.append(new_record)
        to_delete.append(record)

    if to_put:
        ndb.put_multi(to_put)
        num_updated += len(to_put)
        logging.debug('Put %d entities to Datastore for a total of %d', len(to_put), num_updated)
        deferred.defer(updateRecordSchema, cursor=cursor, num_updated=num_updated)
    else:
        logging.debug('UpdateSchema complete with %d updates!', num_updated)

Daniel Roseman · Accepted Answer

The problem here is not db vs ndb: you would have exactly the same problem running this code with db. That's because you're adding new entities, unlike the version in the docs which is simply modifying existing ones. Since the query has no ordering, it defaults to key order, so some new entities will be ahead of the cursor position and therefore returned in future runs (new keys are allocated randomly, so not all will be ahead).

The best thing to do would be to add a filter on a last_updated field, if you have one in your model, or some other field that can be used to distinguish between old and new records.

How does the cursor work in ndb?

Answers (1)

Related Questions