NDB delete & consistency

Question

I understand that NDB queries (reading) are eventually consistent. But what about deleting and writing?

For example will running the below code always (eventually) result in one defective widget being in our datastore:

# get rid of defective widgets
defective_widgets = Widget.query(Widget.defective == True).fetch(keys_only=True)
ndb.delete_multi(defective_widgets)

# uh oh, we have a new one
Widget(defective=True).put()

... or is there a possibility that the delete operation could remove the new defective widget?

Dan McGrath · Accepted Answer

"... is there a possibility that the delete operation could remove the new defective widget?"

No.

"will running the below code always (eventually) result in one defective widget being in our datastore"

Also No. There is a possibility previous defective widgets may not have been deleted.

Cloud Datastore is strongly consistent on look-ups (get by key), and ancestor queries. It is eventually consistent on non-ancestor queries. A way to think about how Cloud Datastore handles consistency is:

Entity Writes are synchronously written/deleted
Indexes are updated asynchronously (most of the time immediately, but not always)
Entity-groups define data locality, so ancestor queries force strong consistency

Let's look at your example

For arguments sake, let's setup a sample data set to start with. Some code adds new defective widgets:

WidgetA(defective=True).put()
WidgetB(defective=True).put()
WidgetC(defective=True).put()

Now we have:

WidgetA(defective: True)
WidgetB(defective: True)
WidgetC(defective: True)

Doing look-ups (get by key) will always return these 3 entities since look-ups are strongly consistent.

Immediately after these 3 widgets are added, some code wants to get a list of all defective widgets.

defective_widgets = Widget.query(Widget.defective == True).fetch(keys_only=True)

Then issues a call to delete them all:

ndb.delete_multi(defective_widgets)

What data do we now have in the database? The answer is any combinations of WidgetA, B, & C, or none of them, or even all of them. Why? Because the query issued was eventually consistent and depending on whether or not the index updates were apply yet determines if the list of entity keys in defective_widgets.

So at this point, after you final execute Widget(defective=True).put(), we could have anywhere between 1 and 4 defective widgets in your database.

How can you avoid this?

Option 1 (Entity Groups): If defective widgets are always written/updated at a rate of 1 transaction per second or lower, then you can put them all in a single entity group, modified your query to be an ancestor query = and then everything will be eventually consistent. Note: If you can batch widget writes/updates together in a transaction, all those writes only count as 1 transaction, so as an example, batching 500 widgets together allows you to do 500 widgets per second.

Option 2 (soft delete): If your write rate is to high, you can implement a soft delete system. A single entity (cached, let's call it "Version Entity") stores a monotonically increasing version number (integer). Any new defective widget written also stores this number in a property called 'version'. When you want to delete a set of widgets, increment the number in Version Entity by one and spin up a query for all widgets with a version less than the new version number. With any query that processes widgets, check to see if the version is less than the current version number and if so discard from process (it's soft deleted). You can optimize it so the queries don't return the majority of the soft deleted items as well by including the version = current filter.

NDB delete & consistency

Answers (2)

Related Questions

NDB delete &amp; consistency

Answers (2)

Related Questions

NDB delete & consistency