a_k
a_k

Reputation: 121

Mailgun: algorithm for event polling

We are implementing support for tracking of Mailgun events in our application. We reviewed the proposed event polling algorithm but find ourselves not quite comfortable with it. First, we would prefer not to discard the data that we have already fetched and then retry from scratch after a pause. It is not very efficient and leaves a door open for a long loop of retries, as it is not clear when the loop is supposed to end. Second, the "threshold age" seems to be the key to determine "trustworthiness", but its value is not defined, only a very large "half an hour" is suggested.

It is our understanding that the events become "trustworthy" after some threshold delay, let us call it D_max, when the events are guaranteed to reside in the event storage. If so, we can implement this algorithm in a different way, so that we do not fetch the data that we know are not "trustworthy" and make use of all data which have been fetched.

We would be fetching data periodically, and on each iteration we would:

  1. Make a request to the events API specifying an ascending time range from T_1 to T_2 = now() - D_max. For the first iteration, T_1 can be set to some time in the past, "e.g., half an hour ago". For the subsequent iterations, T_1 is set to the value of T_2 from the previous iteration.
  2. Fetch all pages one by one while the next page URL is returned.
  3. Use all fetched events, as they are all "trustworthy".

My questions are:

Thanks!

Upvotes: 12

Views: 760

Answers (1)

Petrogad
Petrogad

Reputation: 4423

1: I see no problems with this solution (in fact I'm doing something very similar). I'm also storing ID's of the events to validate I'm not inserting duplicate entries.

2: I've been working through this similar process. Right now I am testing with D_max at 10 minutes.

Additionally, While going through a testing process I'm running an additional task nightly that goes back over the entire day to validate a few things:

  • Am I missing existing metrics?
  • Diagnose if there is a problem with the assumptions I've made about D_max.

Upvotes: 4

Related Questions