Hebe Hilhorst
Hebe Hilhorst

Reputation: 363

Confused by Prometheus 'elements': there seem to be three per pod?

Situation: I just set up Prometheus for my Kubernetes cluster, and wanted to take a closer look at resource usage. Seemed like the way to do this was with container_cpu_usage_seconds_total, but I'm confused by the results.

Problem: Prometheus seems to be returning three different elements for each (single-container) pod. I don't know what these different elements represent, and haven't been able to find a clear explanation on Google.


What I'm doing:

I'm executing this query, where MY-POD-IDENTIFIER is the name of a Kubernetes Pod with one defined container:

rate(container_cpu_usage_seconds_total{pod="MY-POD-IDENTIFIER"}[5m])

What I'm seeing:

The console tab gives me a table of (element, value) with three entries, as seen in the image below. I've included a simplified list of the returned elements below.

enter image description here

[
{
    beta_kubernetes_io_arch = "amd64",
    beta_kubernetes_io_instance_type = "m5a.xlarge",
    beta_kubernetes_io_os = "linux",
    container = "POD",
    cpu = "total",
    id = "/kubepods/podSOME-LONG-ID/A-LONG-ID",
    image = "602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/pause-amd64:3.1",
    instance = "MY-INTERNAL-NODE-INSTANCE",
    job = "kubernetes-nodes-cadvisor",
    name = "k8s_POD_MY-POD-IDENTIFIER-WITH-LONGID",
    namespace = "staging",
    pod = "MY-POD-IDENTIFIER"
},
{
    beta_kubernetes_io_arch = "amd64",
    beta_kubernetes_io_instance_type = "m5a.xlarge",
    beta_kubernetes_io_os = "linux",
    container = "MY-CONTAINER-IMAGE-NAME",
    cpu = "total",
    id = "/kubepods/podSOME-LONG-ID/A-DIFFERENT-LONG-ID",
    image = "111111111.dkr.ecr.ap-southeast-1.amazonaws.com/MY-KUBERNETES-CONTAINER-IMAGE",
    instance = "MY-INTERNAL-NODE-INSTANCE",
    job = "kubernetes-nodes-cadvisor",
    name = "k8s_MY-POD-NAME_MY-POD-IDENTIFIER-WITH-LONGID",
    namespace = "staging",
    pod = "MY-POD-IDENTIFIER"
},
{
    beta_kubernetes_io_arch = "amd64",
    beta_kubernetes_io_instance_type = "m5a.xlarge",
    beta_kubernetes_io_os = "linux",
    cpu = "total",
    id = "/kubepods/podSOME-LONG-ID",
    instance = "MY-INTERNAL-NODE-INSTANCE",
    job = "kubernetes-nodes-cadvisor",
    namespace = "staging",
    pod = "MY-POD-IDENTIFIER"
}]

The graph tab gives me this view: enter image description here

These lines match up to the previous list of elements:

  1. First element: red
  2. Second element: light green
  3. Third element: dark green

My Guesses:

Looking at the list of elements, one of the differentiating factors seems to be the container value. This is respectively: [POD, MY-CONTAINER-IMAGE-NAME, None]. The POD element has CPU usage of 0 throughout, while the other elements have similar-but-varying CPU usage of 0.6m.

My guess was that the POD element represents a kubernetes helper container within the Pod, the MY-CONTAINER-IMAGE-NAME element represents the container I defined, and the None element is a wrapper for both of those. This seems likely because of how closely the None element tracks the MY-CONTAINER-IMAGE-NAME element. But on the other hand, there is variation between them. So I'm not sure.

My Question:

Can someone please explain what these different Prometheus elements are? Or point me in the direction of a good resource?

Upvotes: 0

Views: 263

Answers (1)

justincely
justincely

Reputation: 1080

The "elements" that you see are the "targets" that Prometheus has discovered. To fully understand how/what this is, the Prometheus Kubernetes Service Discovery docs are gonna be the most helpful.

Basically (without seeing your prom config to verify which one), Prometheus is finding 3x targets per pod per some of the rules you see on that page. E.g. with pods it will find 1 target per exposed port.

After you've figured out which of these you want and which you don't need, you can use the relabel_configs to drop the targets that are extra and you don't need to scrape.

Upvotes: 1

Related Questions