Reputation: 5502
I have a cloud run service. It's set up to handle sporadic scientific data processing tasks.
It has min_instances=1
, max_instances=40
. It'll sit with nothing active then burst up to 5-10 active containers a few times a week.
It's on the second generation runtime.
Our bills have been gradually increasing on GCP and we haven't thought much of it (because usage has increased); but recently we've done a bunch of work to streamline services and reduce costs - none of which resulted in the expected cost reductions. Drilling down in the cost SKU units (a UX nightmare, GCP really needs better ability to segment things) I discover that an overwhelming proportion of the cost is in Idle CPU and Idle RAM in Cloud Run.
Combing through the 10+ Cloud Run services we have, two of them are exhibiting a very weird behaviour: The number of idle instances is very high. In the case below, we have 18 machines running CONSTANTLY. These aren't cheap machines, either!
The docs on Cloud Run autoscaling are pretty clear: "To minimize the impact of cold starts, Cloud Run may keep some instances idle for a maximum of 15 minutes." but these containers are on permanently.
With max_instances=1
how can there possibly be 18 idle instances constantly up?
Upvotes: 7
Views: 1694
Reputation: 5502
Answering my own question after a postmortem.
The CI/CD system automatically releases a new revision of the service every time we merge to main on GitHub. 100% of traffic then gets directed to the new revision.
Each revision on Cloud Run is tagged with the code version of that release (eg v0-1-2
for version 0.1.2
). Seems sensible, as it allows us to rollback straightforwardly to a given version.
Since we implemented that CI system, we have made 18 releases. It turns out that if you tag a revision, even if no traffic is going to that revision, Cloud Run will keep that revision alive, respecting its min_instances
parameter.
This is to allow traffic to be routed to a specific revision via the revision_url
, which is derived from the tag, regardless of the traffic routing settings.
But, that prevents us from turning off service revisions. There are only two ways to turn an old revision off so that your costs don't exponentiate over time:
remove its tag (which I find to be crazy because the tag is the way you know which revision is which)
copy the old revision's settings, duplicate them to a new revision, override min_instances
to 0, release the new revision, apply the old tag to the new revision (completely disingenuous because it's not the same revision at all).
never, ever set the min_instances
parameter
If anyone from GCP reads this, adding tags to something should not be the predicate of whether it's kept alive or not. An active
flag which we could change to activate/deactivate service revisions would be really beneficial.
Maybe GCP saw this post, who knows?! But GCP have released a new feature, differentiating between service-level and revision-level min-instances, so it's now possible to get around this problem. See Steren's answer
Upvotes: 8
Reputation: 7927
You are using traffic tags and have set min instances on tagged revisions. Traffic tags allocate minimum instances for tagged revisions.
To keep using traffic tags but only have the min instances set to revisions that are serving your production traffic, switch to service-level min instances
Upvotes: 2