Reputation: 363
Situation: I just set up Prometheus for my Kubernetes cluster, and wanted to take a closer look at resource usage. Seemed like the way to do this was with container_cpu_usage_seconds_total
, but I'm confused by the results.
Problem: Prometheus seems to be returning three different elements for each (single-container) pod. I don't know what these different elements represent, and haven't been able to find a clear explanation on Google.
What I'm doing:
I'm executing this query, where MY-POD-IDENTIFIER is the name of a Kubernetes Pod with one defined container:
rate(container_cpu_usage_seconds_total{pod="MY-POD-IDENTIFIER"}[5m])
What I'm seeing:
The console
tab gives me a table of (element, value) with three entries, as seen in the image below. I've included a simplified list of the returned elements below.
[
{
beta_kubernetes_io_arch = "amd64",
beta_kubernetes_io_instance_type = "m5a.xlarge",
beta_kubernetes_io_os = "linux",
container = "POD",
cpu = "total",
id = "/kubepods/podSOME-LONG-ID/A-LONG-ID",
image = "602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/pause-amd64:3.1",
instance = "MY-INTERNAL-NODE-INSTANCE",
job = "kubernetes-nodes-cadvisor",
name = "k8s_POD_MY-POD-IDENTIFIER-WITH-LONGID",
namespace = "staging",
pod = "MY-POD-IDENTIFIER"
},
{
beta_kubernetes_io_arch = "amd64",
beta_kubernetes_io_instance_type = "m5a.xlarge",
beta_kubernetes_io_os = "linux",
container = "MY-CONTAINER-IMAGE-NAME",
cpu = "total",
id = "/kubepods/podSOME-LONG-ID/A-DIFFERENT-LONG-ID",
image = "111111111.dkr.ecr.ap-southeast-1.amazonaws.com/MY-KUBERNETES-CONTAINER-IMAGE",
instance = "MY-INTERNAL-NODE-INSTANCE",
job = "kubernetes-nodes-cadvisor",
name = "k8s_MY-POD-NAME_MY-POD-IDENTIFIER-WITH-LONGID",
namespace = "staging",
pod = "MY-POD-IDENTIFIER"
},
{
beta_kubernetes_io_arch = "amd64",
beta_kubernetes_io_instance_type = "m5a.xlarge",
beta_kubernetes_io_os = "linux",
cpu = "total",
id = "/kubepods/podSOME-LONG-ID",
instance = "MY-INTERNAL-NODE-INSTANCE",
job = "kubernetes-nodes-cadvisor",
namespace = "staging",
pod = "MY-POD-IDENTIFIER"
}]
The graph
tab gives me this view:
These lines match up to the previous list of elements:
My Guesses:
Looking at the list of elements, one of the differentiating factors seems to be the container
value. This is respectively: [POD
, MY-CONTAINER-IMAGE-NAME
, None
]. The POD
element has CPU usage of 0 throughout, while the other elements have similar-but-varying CPU usage of 0.6m.
My guess was that the POD
element represents a kubernetes helper container within the Pod, the MY-CONTAINER-IMAGE-NAME
element represents the container I defined, and the None
element is a wrapper for both of those. This seems likely because of how closely the None
element tracks the MY-CONTAINER-IMAGE-NAME
element. But on the other hand, there is variation between them. So I'm not sure.
My Question:
Can someone please explain what these different Prometheus elements are? Or point me in the direction of a good resource?
Upvotes: 0
Views: 263
Reputation: 1080
The "elements" that you see are the "targets" that Prometheus has discovered. To fully understand how/what this is, the Prometheus Kubernetes Service Discovery docs are gonna be the most helpful.
Basically (without seeing your prom config to verify which one), Prometheus is finding 3x targets per pod per some of the rules you see on that page. E.g. with pods
it will find 1 target per exposed port.
After you've figured out which of these you want and which you don't need, you can use the relabel_configs
to drop the targets that are extra and you don't need to scrape.
Upvotes: 1