Reputation: 3208
I have a service running in a k8s cluster, which I want to monitor using Prometheus Operator. The service has a /metrics
endpoint, which returns simple data like:
myapp_first_queue_length 12
myapp_first_queue_processing 2
myapp_first_queue_pending 10
myapp_second_queue_length 4
myapp_second_queue_processing 4
myapp_second_queue_pending 0
The API runs in multiple pods, behind a basic Service
object:
apiVersion: v1
kind: Service
metadata:
name: myapp-api
labels:
app: myapp-api
spec:
ports:
- port: 80
name: myapp-api
targetPort: 80
selector:
app: myapp-api
I've installed Prometheus using kube-prometheus
, and added a ServiceMonitor
object:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-api
labels:
app: myapp-api
spec:
selector:
matchLabels:
app: myapp-api
endpoints:
- port: myapp-api
path: /api/metrics
interval: 10s
Prometheus discovers all the pods running instances of the API, and I can query those metrics from the Prometheus graph. So far so good.
The issue is, those metrics are aggregate - each API instance/pod doesn't have its own queue, so there's no reason to collect those values from every instance. In fact it seems to invite confusion - if Prometheus collects the same value from 10 pods, it looks like the total value is 10x what it really is, unless you know to apply something like avg
.
Is there a way to either tell Prometheus "this value is already aggregate and should always be presented as such" or better yet, tell Prometheus to just scrape the values once via the internal load balancer for that service, rather than hitting each pod?
edit
The actual API is just a simple Deployment
object:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-api
labels:
app: myapp-api
spec:
replicas: 2
selector:
matchLabels:
app: myapp-api
template:
metadata:
labels:
app: myapp-api
spec:
imagePullSecrets:
- name: mysecret
containers:
- name: myapp-api
image: myregistry/myapp:2.0
ports:
- containerPort: 80
volumeMounts:
- name: config
mountPath: "app/config.yaml"
subPath: config.yaml
volumes:
- name: config
configMap:
name: myapp-api-config
Upvotes: 2
Views: 3365
Reputation: 4470
Prometheus Operator developers are kindly working (as of Jan 2023) on a generic ScrapeConfig CRD that is designed to solve exactly the use case you describe: https://github.com/prometheus-operator/prometheus-operator/issues/2787
In the meantime, you can use the "additional scrape config" facility of Prometheus Operator to setup a custom scrape target.
The idea is that the configured scrape target will be hit only once per scrape period and the service DNS will load-balance the request to one of the N pods behind the service, thus avoiding duplicate metrics.
In particular, you can override the kube-prometheus-stack
Helm values as follows:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'myapp-api-aggregates':
metrics_path: '/api/metrics'
scheme: 'http'
static_configs:
- targets: ['myapp-api:80']
Upvotes: 1
Reputation: 3613
In your case to avoid metrics aggregation you can use, as already mentioned in your post, avg()
operator to or PodMonitor instead of ServiceMonitor
.
The
PodMonitor
custom resource definition (CRD) allows to declaratively define how a dynamic set of pods should be monitored. Which pods are selected to be monitored with the desired configuration is defined using label selections.
This way it will scrape the metrics from the specified pod only.
Upvotes: 1