DeMaki
DeMaki

Reputation: 491

Is my Durable Azure Function deterministic?

I am new to Durable Azure Functions and want to verify my understanding on what it means to be deterministic.

The flow is something like this:

  1. Based on a given reference date, flag entries that need to processed in database table A.
  2. Create and return 'event' data for the flagged entries.
  3. Process these events in parallel (fan-out). Some of the events will result in data being written to database table B.
  4. Once all events are processed (fan-in), generate an export from the data written to database table B.

My current approach is along the lines of the code below. However, the more I think about it the more I think my current approach is not deterministic. Entries being flagged in table A by the first activity might no longer be flagged if you run the activity some time later (e.g. entry no longer meets criteria to be flagged). That would mean the list returned by the second activity could also differ if the data in table A changed.

Would it be sufficient to change my first activity to return the IDs of the entries in table A that are flagged, and use that list as input for the second activity? To me it then looks similar to this example:

https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-cloud-backup?tabs=csharp#e2_backupsitecontent-orchestrator-function

What I don't really understand in all of this, if you rerun the example in the Microsoft docs the E2_GetFileList could potentially return different files because new files might be added or existing removed. So how is that deterministic?

public class DeterministicOrchestrator
{
    [Function("DeterministicOrchestratorApi")]
    public async Task<HttpResponseData> RunApi(
        [HttpTrigger] HttpRequestData request,
        [DurableClient] DurableTaskClient durableTaskClient)
    {
        var referenceDate = new DateOnly(2023, 4, 3);

        var orchestrationInstanceId = await durableTaskClient
            .ScheduleNewOrchestrationInstanceAsync("DeterministicOrchestrator", referenceDate)
            .ConfigureAwait(false);

        return durableTaskClient.CreateCheckStatusResponse(request, orchestrationInstanceId);
    }

    [Function("DeterministicOrchestrator")]
    public async Task Run(
        [OrchestrationTrigger] TaskOrchestrationContext taskOrchestrationContext,
        DateOnly referenceDate)
    {
        var wrappedDateOnly = new WrappedDateOnly { DateOnly = referenceDate };

        await taskOrchestrationContext
            .CallActivityAsync("FlagDatabaseEntries", wrappedDateOnly)
            .ConfigureAwait(true);

        var events = await taskOrchestrationContext
            .CallActivityAsync<string[]>("CreateEvents", wrappedDateOnly)
            .ConfigureAwait(true);

        // Fan-out/fan-in
        var eventTasks = events
            .Select(x => taskOrchestrationContext.CallActivityAsync("ProcessEvent", input: x))
            .ToList();
        await Task.WhenAll(eventTasks).ConfigureAwait(true);

        await taskOrchestrationContext
            .CallActivityAsync("Export", wrappedDateOnly)
            .ConfigureAwait(true);
    }

    [Function("FlagDatabaseEntries")]
    public Task FlagDatabaseEntries([ActivityTrigger] WrappedDateOnly referenceDate)
    {
        // Flags entries in database table A to be processed using given referenceDate.
        return Task.CompletedTask;
    }

    [Function("CreateEvents")]
    public Task<string[]> CreateEvents([ActivityTrigger] DateOnly referenceDate)
    {
        // Creates events based on the entries flagged in the database by previous activity.
        return Task.FromResult(Array.Empty<string>());
    }

    [Function("ProcessEvent")]
    public Task ProcessEvent([ActivityTrigger] string eventToProcess)
    {
        // Process event and some of the events result in data being added to database table B.
        return Task.CompletedTask;
    }

    [Function("Export")]
    public Task Export([ActivityTrigger] DateOnly referenceDate)
    {
        // Export data from the database populated by processing the events.
        return Task.CompletedTask;
    }
}

public class WrappedDateOnly
{
    public DateOnly DateOnly { get; set; }
}

Upvotes: 2

Views: 957

Answers (1)

juunas
juunas

Reputation: 58743

So the "deterministic" requirement in case of Durable Functions orchestrators only exists because the Durable Task framework executes the code several times as results come in from activities. So with the same inputs to the orchestrator + same outputs (+ events etc.), the orchestrator code should always go through the same steps. Your orchestrator looks deterministic in this sense.

The issue you are referring to is something to consider though. I think what you suggest makes sense. Returning the ids to process is a common pattern that I've used.

One thing that might possibly need clarification is also that activities don't run again during replay. After an activity has returned the result, it will be read from Table Storage instead of calling the activity again.

Small thing to note, you don't need ConfigureAwait() on any of the calls.

Upvotes: 2

Related Questions