How do I read all entities of a kind in a transaction with google cloud datastore nodejs

Question

When I try run a query to read all entities of a kind in a transaction with google datastore it gives me this error

{ Error: Only ancestor queries are allowed inside transactions.
    at /root/src/node_modules/grpc/src/client.js:554:15
  code: 3,
  metadata: Metadata { _internal_repr: {} },

So I need to use an ancestor query. How do I create an ancestor query? It appears to depend on how you structured the hierarchy in datastore. So my next question is, given every entity I have created in datastore has been saved like so (the identifier is unique to the entityData saved)

const entityKey = datastore.key({ namespace: ns, path: [kind, identifier] });
{ key: entityKey, method: 'upsert', data: entityData };

How do I read from the db within a transaction? I think I could do it if I knew the identifiers, but the identifiers are constructed from the entityData that I saved in the kind and I need to read the entities of the kind to figure out what I have in the db (chicken egg problem). I am hoping I am missing something.

More context

The domain of my problem involves sponsoring people. I have stored a kind people in datastore where each entity is a person consisting of a unique identifier, name and grade. I have another kind called relationships where each entity is a relationship containing two of the peoples identifiers, the sponsor & sponsee (linking to people together). So I have structured it like an RDB. If I want to get a persons sponsor, I get all the relationships from the db, loop over them returning the relationships where the person is the sponsee then query the db for the sponsor of that relationship.

How do I structure it the 'datastore' way, with entity groups/ancestors, given I have to model people and their links/relationships.

Let's assume a RDB is out of the question.

Example scenario

Two people have to be deleted from the app/db (let's say they left the company on the same day). When I delete someone, I also want to remove their relationships. The two people I delete share a relationship (one is sponsoring the other). Assume the first transaction is successful i.e. I delete one person and their relationship. Next transaction, I delete one person, then search the relationships for relevant relationships and I find one that has already been deleted because eventually consistent. I try find the person for that relationship and they don't exist. Blows up.

Note: each transaction wraps delete person & their relationship. Multiple people equals multiple transactions.

Scalability is not a concern for my application

Dan Cornilescu · Accepted Answer

Your understanding is correct:

you can't use an ancestor query since your entities are not in an ancestry relationship (i.e. not in the same entity group).
you can't perform non-ancestor queries inside transactions. Note that you also can't read more than 25 of your entities inside a single transaction (each entity is in a separate entity group). From Restrictions on queries:

Queries inside transactions must be ancestor queries

Cloud Datastore transactions operate on entities belonging to up to 25 entity groups, but queries inside transactions must be ancestor queries. All queries performed within a transaction must specify an ancestor. For more information, refer to Datastore Transactions.

The typical approach in a context similar to yours is to perform queries outside transactions, often just keys only queries - to obtain the entity keys, then read the corresponding entities (up to 25 at a time) by key lookup inside transactions. And use transactions only when it's absolutely needed, see, for example, this related discussion: Ancestor relation in datastore.

Your question apparently suggests you're approaching the datastore with a relational DB mindset. If your app fundamentally needs relational data (you didn't describe what you're trying to do) the datastore might not be the best product for it. See Choosing a storage option. I'm not saying that you can't use the datastore with relational data, it can still be done in many cases, but with a bit more careful design - those restrictions are driving towards scalable datastore-based apps (IMHO potentially much more scalable that you can achieve with relational DBs)

There is a difference between structuring the data RDB style (which is OK with the datastore) and using it in RDB style (which is not that good).

In the particular usage scenario you mentioned you do not need to query for the sponsor of a relationship: you already have the sponsor's key in the relationship entity, all you need to do is look it up by key, which can be done in a transaction.

Getting all relationship entities for a person needs a query, filtered by the person being the sponsor or the sponsee. But does it really have to be done in a transaction? Or is it acceptable if maybe you miss in the result list a relationship created just seconds ago? Or having one which was recently deleted? It will eventually (dis)appear in the list if you repeat the query a bit later (see Eventual Consistency on Reading an Index). If that's acceptable (IMHO it is, relationships don't change that often, chances of querying exactly right after a change are rather slim) then you don't need to make the query inside a transaction thus you don't need an ancestry relationship between the people and relationship entities. Great for scalability.

Another consideration: looping through the list of relationship entities: also doesn't necessarily have to be done in a transaction. And, if the number of relationships is large, the loop can hit the request deadline. A more scalable approach is to use query cursors and split the work across multiple tasks/requests, each handling a subset of the list. See a Python example of such approach: How to delete all the entries from google datastore?

For each person deletion case:

add something like a being_deleted property (in a transaction) to that person to flag the deletion and prevent any use during deletion, like creating new relationship while the deletion task is progressing. Add checks for this flag wherever needed in the app's logic (also in transactions).
get the list of all relationship keys for that person and delete them, using the looping technique mentioned above
in the last loop iteration, when there are no relationships left, enqueue another task, generously delayed, to re-check for any recent relationships that might have been missed in the previous loop execution due to the eventual consistency. If any shows up re-run the loop, otherwise just delete the person

If scalability is not a concern, you can also re-design you data structures to use ancestry between all your entities (placing them in the same entity group) and then you could do what you want. See, for example, What would be the purpose of putting all datastore entities in a single group?. But there are many potential risks to be aware of, for example:

max rate of 1 write/sec across the entire entity group (up to 500 entities each), see Datastore: Multiple writes against an entity group inside a transaction exceeds write limit?
large transactions taking too long and hitting the request deadlines, see Dealing with DeadlineExceededErrors
higher risk of contention, see Contention problems in Google App Engine

How do I read all entities of a kind in a transaction with google cloud datastore nodejs

Answers (1)

Related Questions