Reputation: 4619
I want to ensure that a task — especially a task that operates on a single entity — gets added to a push queue at-most-once, till such a time that the previously added task is finished. Then I should be able to add the same task — for the same entity — again.
A simple example is a task that updates entity A. I want to be able to:
The simple solution seems to be to use a task name that incorporates both the name of the task X and the unique ID of entity A.
However, I think this approach doesn't satisfy condition 3: task names get "tombstoned" for an uncontrollable period & can't be re-used till then.
From the docs:
An advantage of assigning your own task names is that named tasks are de-duplicated, which means you can use task names to guarantee* that a task is only added once. De-duplication continues for 9 days after the task is completed or deleted.
Does this mean task names can't be re-used for 9 days?
Upvotes: 0
Views: 218
Reputation: 2618
I have this use case in the past where I need to do lots of small updates to a single entity, but the update does not need to be reflected immediately. I solved it by batching the update in a pull queue and I have cron job run every X mins to pull a number of tasks and do batch update. In my case the cron job simply enqueue a task to a push queue. The task then consume from the pull queue and do transactional update.
Reference doc https://cloud.google.com/datastore/docs/articles/fast-and-reliable-ranking-in-datastore/
Upvotes: 1
Reputation: 39824
Indeed, task names can't be re-used for 9 days after they are no longer in the queue. Probably a safety reason to ensure all traces of the previous identically-named tasks are flushed from the entire distributed infra.
You could encode in the task name the current timestamp, rounded to the full second, which would limit your actual write rate to 1/s (which is the max average write rate to the same entity group anyways). If you fail to enqueue the task (because it is already in the queue) you try to enqueue one for the next second (if you don't have some alternate way of triggering another update task). But encode the timestamp towards the end of the task name, not the beginning, to avoid the performance implications mentioned in the same doc you referenced.
Upvotes: 1