Reputation: 1770
We have a long-running process in an app service (can require over 5 mins) that was timing out due to the non-configurable 230-second timeout on Azure's built-in load balancer.
So we refactored to the async-http API pattern using Azure Durable Functions. We have a single activity function that cannot be easily broken down into smaller bits of work for reasons that are beyond the scope of this question.
I noticed strange results in the output log, and determined that the activity function gets restarted by Azure Functions after a few minutes. I set a breakpoint in the activity function and it gets hit (again) after a few minutes.
This isn't something I configured myself; my calling code that starts the function only executes once. What's going on? How can I make the activity function run to completion?
It works fine and completes as expected when the workload is under a few minutes.
The function app code looks something like this:
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using System.Web.Http;
using OurContentModel;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.DurableTask;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;
namespace Content20.Store
{
public class StoreContent
{
/// <summary>
/// Starter function called by HTTP. Starts the orchestrator and returns an endpoint the client
/// can query for status and for the result once complete.
/// </summary>
/// <remarks>See https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp#async-http </remarks>
[FunctionName("StoreContent")]
public async Task<IActionResult> HttpStart(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")]
HttpRequest req,
[DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
// Get function input comes from the request content and query params.
// ...
var content = JsonConvert.DeserializeObject<OurData>(requestBody);
string instanceId = await starter.StartNewAsync(
"StoreContent_RunOrchestrator",
new StoreContentInputArgs()
{
OurContent = content
});
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
return starter.CreateCheckStatusResponse(req, instanceId);
}
/// <summary>
/// Orchestration function that calls the activity function(s)
/// and returns the final result when they're done.
/// </summary>
[FunctionName("StoreContent_RunOrchestrator")]
public async Task<StoreContentResult> RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var input = context.GetInput<StoreContentInputArgs>();
return await context.CallActivityAsync<StoreContentResult>("StoreContent_WriteFamilyData", input);
}
/// <summary>
/// Activity function that does the actual work.
/// </summary>
[FunctionName("StoreContent_WriteFamilyData")]
public async Task<StoreContentResult> WriteFamilyData([ActivityTrigger] StoreContentInputArgs input, ILogger log)
{
try
{
// breakpoint here gets hit a second time when first invocation takes more than a few minutes,
// with "external code" below it in the call stack so I assume it's getting (re)started by the system?
var storer = new OurContentStorer(log);
await storer.StoreContentAsync(input); // long-running process
return new StoreContentResult()
{
Success = true,
Message = "OK"
};
}
catch (Exception ex)
{
log.LogError(ex, ex.ToString());
return new StoreContentResult()
{
Success = false,
Message = ex.Message
};
}
}
}
}
We have already increased the timeout of the function to an hour in host.json
. It's running on a premium plan in Azure.
The call stack when I put a break in the activity function looks like this the second time the breakpoint is hit:
Upvotes: 0
Views: 1239
Reputation: 1770
It turns out the "restarts" were really just leftover invocations. At some point I'd killed the process (this is all local debug using Azure Functions Tools) and it left some state around in my Azure Storage emulator. So my local Azure tools thought the durable function was still running and/or needed to be restarted.
To get rid of the zombie invocations I just deleted all the "testhub" tables, queues, and blobs from my emulator using Azure Storage Explorer. These get auto-generated when you run a durable function locally.
Upvotes: 1