Vivek Jha
Vivek Jha

Reputation: 1580

Achieving Strong Consistency Using get_or_insert

I have a model like this:

class UserModel(ndb.Model):
    ''' model class which stores all the user information '''
    fname = ndb.StringProperty(required=True)
    lname = ndb.StringProperty(required=True)
    sex = ndb.StringProperty(required=True, choices=['male', 'female'])
    age = ndb.IntegerProperty(required=True)
    dob = ndb.DateTimeProperty(required=True)
    email = ndb.StringProperty(default=None)
    mobile = ndb.StringProperty(required=True)
    city = ndb.StringProperty(required=True)
    state = ndb.StringProperty(required=True)

Since none of above fields are unique, not even email becuase many people may no have email ids. So I am using the following logic to create a string id

1. Take first two letters of 'state' and change it to upper case.
2. Take first to letters of 'city' and change it to upper case.
3. Get the count of all records in the database and increment by one.
4. Append all of them together.

I am using get_or_insert for inserting the entity.

Though adding a user, will not happen too often but any kind of clash would be catastrophic, means probability of contention is less but its impact is very high.

My questions are:

1. Will using get_or_insert guarantee that I will never have duplicate IDs?
2. get_or_insert documentation says "Transactionally retrieves an existing 
   entity or creates a new one.". How can something perform an operation 
   "transactionally" without using a ancestor query.

PS: For several reasons I can't keep all the user entities in the same entity groups.

Upvotes: 0

Views: 135

Answers (2)

Patrick Costello
Patrick Costello

Reputation: 3626

In order to provide transactionality, get_or_insert uses a Datastore transaction. In order to use a query in a transaction it must be an ancestor query, however transactions can also get and put, which don't require a parent to be set on the entity.

However, as @Greg mentioned, you absolutely do not want to use this scheme for generating user ids. In particular, doing a count on your db is incredibly slow and will not scale, and is eventually consistent. Because the query is eventually consistent, it may return a count smaller than the actual count as long as results are eventually consistent (which for a large app will be all the time). This means you could wait several hours before an insert would actually succeed.

If you want to provide a customer ID with a State and City, I would recommend doing the following:

  1. Do a put using automatic ids.
  2. Expose to the user a "Customer ID" which is the State + City + ID.
  3. When you want to lookup a customer given their "Customer ID", just do a get for the ID portion.

Upvotes: 3

Patrice
Patrice

Reputation: 4692

if you keep that ID scheme (for which you honestly don't really need steps 1 and 2, just 3), there is no reason for it to create duplicate IDs. With get_or_insert, it'll look for the exact ID you provide and fetch it if it exists, or simply create it if it doesn't, as explained here. So you CANNOT have duplicate IDs (well if you have this ID as your forced key in your model). if you follow the link provided it clearly states that :

The get and subsequent (possible) put operations are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.

And the fact it does it transactionnaly means it'll lock up the entity group to be sure you don't have contention. Since you don't seem to have ancestors I think it'll just lock the entity you're updating

Upvotes: 0

Related Questions