Manu
Manu

Reputation: 364

Container CPU Usage is higher than Node CPU Usage

Background

I am trying to distribute the energy consumption of a whole system (e.g. raspberrypi) to pods of serverless functions. Unfortunately, I am receiving weird results. I already got the whole energy measurement setup and for a simple start, I have 1 Pod of a serverless function, lets name it analyze-sentence, deployed on OpenFaaS in Kubernetes.

I use Prometheus with node-exporter and cadvisor in order to get metrics indiciating the CPU Usage of my Kubernetes nodes and the CPU Usage of containers. For the energy consumption I have written my own custom exporter, that provides corresponding metrics.

What I've tried

I thought to come up with a simple formula first that only takes CPU Usage into account. It is composed of:

With this I can first compute the CPU Usage of the Pod relative to the total CPU Usage of the whole System, given in Percentage:

(CPU Usage (Pod) / Number of Cores) / CPU Usage (System)

which returns something in the interval [0...1] and then I could multiply the result with the measured energy consumption.

My idea was, when retrieving the metrics from Prometheus, to take the past 1 minute into account. For the CPU Usage its probably better to get an average using the rate function. So I want average CPU Usage of the System in the past minute, average CPU Usage of the Pod in the past minute, etc.

The values are computed using the following PromQL queries (let's assume to use raspberrypi) instance:

100 - (avg by (instance) (rate(node_cpu_seconds_total{job='node-exporter', instance='raspberrypi', mode='idle'}[1m])) * 100) > 0
machine_cpu_cores{node='raspberrypi'}
rate(container_cpu_usage_seconds_total{container='analyze-sentence', image!='', container_name!='POD'}[1m]) > 0
idelta(powerexporter_power_consumption_ampere_seconds_total{instance='raspberrypi'}[2m:1m])

idelta takes the last two samples in a range query and computes the difference. So with this query I only get two samples anyway, the total energy consumption measured at the current minute and at the past minute. So this should give me the amount of energy consumed within the past 60 seconds.

The Problem

I am receiving weird results regarding the CPU Usage. Sometimes, the CPU Usage of the Pod is higher than the CPU Usage of the System, which obviously doesn't make any sense. At first I thought the timestamp of the individual metrics aren't the same, but this is not the case. See a sample result after querying the Prometheus REST API for the needed data:

2022-07-30 13:36:05,840 - __main__ - INFO >>> CPU Cores Query >>> [Timestamp: 1659180963.405 | Number of Cores: 8]
2022-07-30 13:36:05,938 - __main__ - INFO >>> Node CPU Usage Query >>> [Timestamp: 1659180963.503 | CPU Usage: 15.909242428069987 %]
2022-07-30 13:36:06,029 - __main__ - INFO >>> Container CPU Usage Query >>> [Timestamp: 1659180963.594 | CPU Usage: 1.4602082000000034 Cores
2022-07-30 13:36:06,116 - __main__ - INFO >>> Energy Consumption Query >>> [Timestamp: 1659180963.68 | Energy Consumption: 19.318549297684513 As]
2022-07-30 13:36:06,116 - __main__ - INFO >>> Container CPU Usage (Percentage) relative to the complete node: 18.25260250000004 %
2022-07-30 13:36:06,116 - __main__ - INFO >>> Energy Consumption of analyze-sentence: 22.164084984030715 As

I query the Prometheus REST API every 60 seconds. I`m only getting weird results occasionally, most of the times they make sense. But I can't explain why it is happening at all, no matter at what time i query the Prometheus API, the average CPU Usage of the system should always be higher than the average CPU Usage of a Pod, right? Do you have any idea where the issue is? Wrong data? Wrong queries? Something wrong with my approach?

Upvotes: 0

Views: 1920

Answers (1)

SYN
SYN

Reputation: 5041

One way to explain this could be that Prometheus gets your nodes metrics from the node-exporter, and your container metrics from cadvisor.

There's no guarantee Prometheus would scrape metrics from both services at the same time. Prometheus will try proceed with each job at least once every scrape_interval.

Each target would have its metrics collected at some point, but not necessarily the exact same second. Comparing values from different sources, glitches like this could happend.

Upvotes: 0

Related Questions