How to configure Prometheus in a multi-location scenario?

Question

I love using Prometheus for monitoring and alerting. Until now, all my targets (nodes and containers) lived on the same network as the monitoring server.

But now I'm facing a scenario, where we will deploy our application stack (as a bunch of Docker containers) to several client machines in thier networks. Nearly all of the clients networks are behind a firewall or NAT. So scraping becomes quite difficult.

As we're still accountable for our stack, I'd like to have a central montioring server, altering and dashboards.

I was wondering what could be the best architecture if want to implement it with Prometheus, but I couldn't find any convincing approaches. My ideas so far:

Use a Pushgateway on our side and push all data out of the client networks. As the docs state, it's not intended that way: https://prometheus.io/docs/practices/pushing/
Use a federation setup (https://prometheus.io/docs/prometheus/latest/federation/): Place a Prometheus server in every client network behind a reverse proxy (to enable SSL and authentication) and aggregate relevant metricts there. Open/forward just a single port for federation scraping.
Other more experimental setups, such as SSH Tunneling (e.g. here https://miek.nl/2016/february/24/monitoring-with-ssh-and-prometheus/) or VPN!?

Thank you in advance for your help!

night-gold · Accepted Answer

Nobody posted an answer so I will try to give my opinion on the second choice because that's what I think I would do in your situation.

The second setup seems the most flexible, you have access to the datas and only need to open one port on for the federating server, so it should still be secure.

One other bonus of this type of setup is that even if the firewall stop working for a reason or another, you will still have a prometheus scraping, you will have an alert because you won't be able to access the server(s) but when the connexion comes again you will have all the datas. You won't have a hole in the grafana dashboards because there was no datas, apart during the incident.

The issue with this setup is the fact that you need to maintain a number of server equivalent to the number of networks. A solution for this would be to have a packer image or maybe an ansible playbook to deploy.

How to configure Prometheus in a multi-location scenario?

Answers (1)

Related Questions