Reputation: 308
I've been a using of Prometheus for a while but have trouble figuring this one out.
We're implementing a blue/green deployment setup that will be monitored by Prometheus. All exporters are discovered through consul and collected by a local prometheus server that will be scraped through federation so we can more easily secure the setup and have only one monitoring access point for the whole setup.
Now, let us say blue is in production. We'll collect metrics like latency and also system metrics for debugging if necessary.
When green is not in production, most of its servers will be stopped. So there will not be a green-mysql responding.
What will be the best practice to tackle this? We cannot check for mysql alone as that would allow the blue db to be down while the green responds even through green is not in production. If we check both, there will be alerts when shutting down an inactive side we no longer care about. We can switch the alerting priority manually but that does not seem like a good solution.
I've been searching online but that only mentioned monitoring services instead of machines. While I agree on that we cannot check the green mysql service if green is completely or partially stopped.
Can we read out a variable from one of our machines and use it to switch the monitoring priorities? I don't think Prometheus supports that.
Any hint or reading material pointing me in the good direction is appreciated.
Upvotes: 0
Views: 2241
Reputation: 34112
Another option that doesn't leak into all your alerts would be to have a silence that you switch as you change environments.
Upvotes: 1
Reputation: 308
I'm actually going to respond to this myself.
We're going to add a /metrics page to our application, which is aware if blue or green is active due to the shared consul k/v store.
The result will be something like this.
myapp_blue_live{region=xxx} 0
myapp_green_live{region=xxx} 1
Thanks to this, we can use Prometheus's if syntax in the alerts and say things like the following (simplified) configuration:
if myapp_blue_live == 0 and mysql_errors > 0
This way our monitoring always follows the live environment. The color that is standby can have its alerts routed by mail/slack to be treated the next business day.
Upvotes: 0