Reputation: 1097
I'm using Knative serving with KPA. Autoscaling is available in Knative based on concurrency and RPS. But we need to scale different services based on queue lengths because there are long running async processes. Is there any way we can achieve this in Knative? I can't use Knative HPA because we need scale to zero feature of Knative. Thanks in advance!
Upvotes: 4
Views: 566
Reputation: 3493
If you have async (background or scheduled processes), it's likely that Knative is not a good match for your application. There has been some investigation into exposing the HPA v2 custom metrics scanning options (which would might preclude scale to zero, as you note), but even with HPA2 scaling, you'll still run into problems.
The problem with background processes is that Knative and Kubernetes don't have visibility into which Pods are still doing work, so they are equally likely to shut down a Pod doing work as one that is idle.
One workaround would be to move the async work to be synchronous with a request (possibly by using eventing to send a "do work" event), and then processing those events synchronously -- the eventing Broker won't get upset if your requests take a long time to complete. If you're worried about non-uniform processing times, you can even run a second copy of the Knative Service just for handling the long-running requests.
Upvotes: 0