How to ensure isolation with non-ancestor query

Question

I want to create user using ndb such as below:

def create_user(self, google_id, ....):
  user_keys = UserInformation.query(UserInformation.google_id == google_id ).fetch(keys_only=True)

  if user_keys: # check whether user exist.
    # already created
    ...(SNIP)...
  else:
    # create new user entity.
    UserInformation(
      # primary key is incompletekey
      google_id = google_id,
      facebook_id = None,
      twitter_id = None,
      name = 
      ...(SNIP)...
    ).put()

If this function is called twice in the sametime, two user is created.("Isolation" is not ensure between get() and put())

So, I added @ndb.transactional to above function. But following error is occured.

BadRequestError: Only ancestor queries are allowed inside transactions.

How to ensure isolation with non-ancestor query?

Dan Cornilescu · Accepted Answer

The ndb library doesn't allow non-ancestor queries inside transactions. So if you make create_user() transactional you get the above error because you call UserInformation.query() inside it (without an ancestor).

If you really want to do that you'd have to place all your UserInformation entities inside the same entity group by specifying a common ancestor and make your query an ancestor one. But that has performance implications, see Ancestor relation in datastore.

Otherwise, even if you split the function in 2, one non-transactional making the query followed by a transactional one just creating the user - which would avoid the error - you'll still be facing the datastore eventual consistency, which is actually the root cause of your problem: the result of the query may not immediately return a recently added entity because it takes some time for the index corresponding to the query to be updated. Which leads to room for creating duplicate entities for the same user. See Balancing Strong and Eventual Consistency with Google Cloud Datastore.

One possible approach would be to check later/periodically if there are duplicates and remove them (eventually merging the info inside into a single entity). And/or mark the user creation as "in progress", record the newly created entity's key and keep querying until the key appears in the query result, when you finally mark the entity creation as "done" (you might not have time to do that inside the same request).

Another approach would be (if possible) to determine an algorithm to obtain a (unique) key based on the user information and just check if an entity with such key exists instead of making a query. Key lookups are strongly consistent and can be done inside transactions, so that would solve your duplicates problem. For example you could use the google_id as the key ID. Just an example, as that's not ideal either: you may have users without a google_id, users may want to change their google_id without loosing other info, etc. Maybe also track the user creation in progress in the session info to prevent repeated attempts to create the same user in the same session (but that won't help with attempts from different sessions).

How to ensure isolation with non-ancestor query

Answers (2)

Related Questions