Reputation: 1236
I have a scheduled function that runs every three minutes.
It is supposed to look on the database (firestore), query the relevant users, send them emails or perform other db actions.
Once it sends an email to a user, it updates the user with a field 'sent_to_today:true'.
If sent_to_today == true, the function won't touch that user for around 24 hours, which is what's intended.
But, because I have many users, and the function is doing a lot of work, by the time it updates the user with sent_to_today:true, another invocation gets to that user beforehand and processes them for sending emails.
This results in some users getting the same email, twice.
What are my options to make sure this doesn't happen?
Data Model (simplified):
users (Collection)
--- userId (document)
--- sent_to_today [Boolean]
--- NextUpdateTime [String representing a Timestamp in ISO String]
When the function runs, if ("Now" >= NextUpdateTime) && (sent_to_today==false), the user is processed, otherwise, they're skipped.
How do I make sure that the user is only processed by one invocation per day, and not many?
As I said, by the time they're processed by one function invocation (which sets "sent_to_today" to true), the next invocation gets to that user and processes them.
Any help in structuring the data better or using any other logical method would be greatly appreciated.
Here is an idea I'm considering:
Upvotes: 0
Views: 422
Reputation: 2725
Option 1.
Do you think you the function can be invoked once in ten minutes, rather every three minutes? If yes - just modify the scheduler, and make sure that 'max instances' attribute is '1'. As the function timeout is only 540 seconds, 10 minutes (600 seconds) is more than enough to avoid overlapping.
Option 2.
When a firestore document is chosen for processing, the cloud function modifies some attribute - i.e. __state
- and sets its value to IN_PROGRESS
for example. When the processing is finished (email is sent), that attribute value is modified again - to DONE
for example. Thus, if the function picks up a document, which has the value IN_PROGRESS
in the __state
attribute - it simply ignores and continues to the next one.
The drawback - if the function crashes - there might be documents with IN_PROGRESS
state, and there should be some mechanism to monitor and resolve such cases.
Option 3.
One cloud function runs through the firestore collection, and for each document, which is to be processed - sends a pubsub message which triggers another cloud function. That one works only with one firestore document. Nevertheless the 'state machine' control is required (like in the Option 2 above). The benefit of the option 3 - higher level of specialisation between functions, and there may be many 'second' cloud functions running in parallel.
Upvotes: 2