Sam
Sam

Reputation: 6265

How to manage execution history for Azure Durable Functions

I'm developing an ETL process that uses Durable Functions (v2) for execution. The basic process is as follows:

  1. Use activity to retrieve list of product codes to be processed
  2. Fan-out from main orchestrator into N sub-orchestrations to merge multiple integration data sources into single object and update in Cosmos DB

The main orchestration is implemented using the singleton instance pattern so that only one instance is running at a time.

It's working fine, but the execution history in the underlying TaskHub table storage grows significantly with each execution of this process, and there is an obvious maintenance concern there, because this process will run on the hour and will generate a lot of data in the underlying TaskHub tables.

I'm struggling to find guidance on how to maintain the execution history of this process so that it doesn't grow too much. I'm aware of the ContinueAsNewAsync() API, but this doesn't really fit into my design very well, because it forces the process to run again as well. I can't find info on any API that can be used to clear execution history either.

Is it a matter of manually clearing the tables directly for now, say using separate timer-triggered functions? This feels a bit hacky / volatile, given that the schema of the durable functions tables could change at any point.

Upvotes: 3

Views: 2028

Answers (3)

R.Titov
R.Titov

Reputation: 3293

GetStatusAsync is Obsolete but you can use ListInstancesAsync instead:

DefaultPageSize = 100;
OrchestrationStatusQueryResult statusQueryResult = null;

do
{
    var instances = await client.ListInstancesAsync(
      new OrchestrationStatusQueryCondition
      {
         CreatedTimeFrom = creationTimeFrom,
         CreatedTimeTo = creationTimeTo,
         RuntimeStatus = new[]
         {
           OrchestrationRuntimeStatus.Completed, 
           OrchestrationRuntimeStatus.Failed, 
           OrchestrationRuntimeStatus.Canceled,
         },
         PageSize = DefaultPageSize,
         ContinuationToken = statusQueryResult?.ContinuationToken,
      }, CancellationToken.None);
      
    foreach (var instance in statusQueryResult.DurableOrchestrationState)
    {
        await client.PurgeInstanceHistoryAsync(instance.InstanceId);
    }
      
} while (statusQueryResult?.ContinuationToken != null);

Upvotes: 1

Thorkil Holm-Jacobsen
Thorkil Holm-Jacobsen

Reputation: 7676

Durable Functions 1.7 introduced Orchestration History Purging, which allows you to delete all data relating to a specified instance:

await client.PurgeInstanceHistoryAsync(instanceId);

You will still have to implement the triggering logic (e.g. a timer triggered job). To find the instances you wish to delete, you can use the GetStatusAsync method which allows you to query on creation time and instance status:

var instances = await client.GetStatusAsync(
    creationTimeFrom, 
    creationTimeTo,
    new[] { OrchestrationRuntimeStatus.Completed, OrchestrationRuntimeStatus.Failed, OrchestrationRuntimeStatus.Canceled });

Upvotes: 4

Ling Toh
Ling Toh

Reputation: 2474

Yes, you have to manually delete the table entries for now or automate it using an out-of-band workflow or a Timer-Triggered Function.

There is an open GitHub issue tracking this at https://github.com/Azure/azure-functions-durable-extension/issues/17

And engineering effort has already begun to address this issue. Kindly see https://github.com/Azure/durabletask/pull/216

Upvotes: 2

Related Questions