Reputation: 6265
I'm developing an ETL process that uses Durable Functions (v2) for execution. The basic process is as follows:
The main orchestration is implemented using the singleton instance pattern so that only one instance is running at a time.
It's working fine, but the execution history in the underlying TaskHub table storage grows significantly with each execution of this process, and there is an obvious maintenance concern there, because this process will run on the hour and will generate a lot of data in the underlying TaskHub tables.
I'm struggling to find guidance on how to maintain the execution history of this process so that it doesn't grow too much. I'm aware of the ContinueAsNewAsync()
API, but this doesn't really fit into my design very well, because it forces the process to run again as well. I can't find info on any API that can be used to clear execution history either.
Is it a matter of manually clearing the tables directly for now, say using separate timer-triggered functions? This feels a bit hacky / volatile, given that the schema of the durable functions tables could change at any point.
Upvotes: 3
Views: 2028
Reputation: 3293
GetStatusAsync
is Obsolete
but you can use ListInstancesAsync
instead:
DefaultPageSize = 100;
OrchestrationStatusQueryResult statusQueryResult = null;
do
{
var instances = await client.ListInstancesAsync(
new OrchestrationStatusQueryCondition
{
CreatedTimeFrom = creationTimeFrom,
CreatedTimeTo = creationTimeTo,
RuntimeStatus = new[]
{
OrchestrationRuntimeStatus.Completed,
OrchestrationRuntimeStatus.Failed,
OrchestrationRuntimeStatus.Canceled,
},
PageSize = DefaultPageSize,
ContinuationToken = statusQueryResult?.ContinuationToken,
}, CancellationToken.None);
foreach (var instance in statusQueryResult.DurableOrchestrationState)
{
await client.PurgeInstanceHistoryAsync(instance.InstanceId);
}
} while (statusQueryResult?.ContinuationToken != null);
Upvotes: 1
Reputation: 7676
Durable Functions 1.7 introduced Orchestration History Purging, which allows you to delete all data relating to a specified instance:
await client.PurgeInstanceHistoryAsync(instanceId);
You will still have to implement the triggering logic (e.g. a timer triggered job). To find the instances you wish to delete, you can use the GetStatusAsync
method which allows you to query on creation time and instance status:
var instances = await client.GetStatusAsync(
creationTimeFrom,
creationTimeTo,
new[] { OrchestrationRuntimeStatus.Completed, OrchestrationRuntimeStatus.Failed, OrchestrationRuntimeStatus.Canceled });
Upvotes: 4
Reputation: 2474
Yes, you have to manually delete the table entries for now or automate it using an out-of-band workflow or a Timer-Triggered Function.
There is an open GitHub issue tracking this at https://github.com/Azure/azure-functions-durable-extension/issues/17
And engineering effort has already begun to address this issue. Kindly see https://github.com/Azure/durabletask/pull/216
Upvotes: 2