user1797466
user1797466

Reputation: 517

Prometheus not receiving metrics from cadvisor in GKE

Heyo,

I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.

I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.

    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

cribbed from the prometheus/docs/examples.

I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get (they exist) it seems to me the issue is prometheus communicating with the cadvisor target.

If anyone has experience getting this configured I'd sure appreciate some help debugging.

Cheers

Upvotes: 4

Views: 5559

Answers (2)

vishvjeet
vishvjeet

Reputation: 19

Too Frustrating, I've been digging for past few days.

The issue started since after the gke master upgraded from 1.15.12-gke.2 to 1.16.13-gke.401.

To confirm this, did the same in another gke cluster, and result is same.

and above configuration is giving 403 forbidden.

enter image description here

Upvotes: 0

user1797466
user1797466

Reputation: 517

I was able to dig up a blog that had an example configuration that worked for me. The GKE endpoint for cadvisor (and kubelet) metrics, is different than the standard ones that are found in documentation examples. Here's an excerpt from my working prometheus jobs:

    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc.cluster.local:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: kubernetes-kubelet
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc.cluster.local:443
      - target_label: __metrics_path__
        source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        replacement: /api/v1/nodes/${1}/proxy/metrics

Edit: here's a link to the blog post -> https://medium.com/htc-research-engineering-blog/monitoring-kubernetes-clusters-with-grafana-e2a413febefd.

Upvotes: 5

Related Questions