Reputation: 517
Heyo,
I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.
I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.
kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor"
, but when I look in prometheus for container_cpu_usage
or container_memory_usage
, there is no data. - job_name: kubernetes-cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
cribbed from the prometheus/docs/examples.
I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get
(they exist) it seems to me the issue is prometheus communicating with the cadvisor target.
If anyone has experience getting this configured I'd sure appreciate some help debugging.
Cheers
Upvotes: 4
Views: 5559
Reputation: 19
Too Frustrating, I've been digging for past few days.
The issue started since after the gke master upgraded from 1.15.12-gke.2 to 1.16.13-gke.401.
To confirm this, did the same in another gke cluster, and result is same.
and above configuration is giving 403 forbidden.
Upvotes: 0
Reputation: 517
I was able to dig up a blog that had an example configuration that worked for me. The GKE endpoint for cadvisor (and kubelet) metrics, is different than the standard ones that are found in documentation examples. Here's an excerpt from my working prometheus jobs:
- job_name: kubernetes-cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: kubernetes-kubelet
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- target_label: __metrics_path__
source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/${1}/proxy/metrics
Edit: here's a link to the blog post -> https://medium.com/htc-research-engineering-blog/monitoring-kubernetes-clusters-with-grafana-e2a413febefd.
Upvotes: 5