sdobedev
sdobedev

Reputation: 31

Prometheus + Longhorn = wrong volume size

I am not really sure, if this is a prometheus issue, or just Longhorn, or maybe a combination of the two.

Setup:

Problem:

Infinitely growing PV in Longhorn, even over the defined max size. Currently using 75G on a 50G volume.

Description:

I have a really small 3 node cluster with not too many deployments running. Currently only one "real" application and the rest is just kubernetes system stuff so far.
Apart from etcd, I am using all the default scraping rules.
The PV is filling up a bit more than 1 GB per day, which seems fine to me.

The problem is, that for whatever reason, the data used inside longhorn is infinitely growing. I have configured retention rules for the helm chart with a retention: 7d and retentionSize: 25GB, so the retentionSize should never be reached anyway.
When I log into the containers shell and do a du -sh in /prometheus, it shows ~8.7GB being used, which looks good to me as well.
The problem is that when I look at the longhorn UI, the used spaced is growing all the time. The PV does exist now for ~20 days and is currently using almost 75GB of a defined max of 50GB. When I take a look at the Kubernetes node itself and inspect the folder, which longhorn uses to store its PV data, I see the same values of space being used as in the Longhorn UI, while inside the prometheus container, everything looks good to me.

I hope someone has an idea what the problem could be. I have not experienced this issue with any other deployment so far, all others are good and really decrease in size used, when something inside the container gets deleted.

Upvotes: 1

Views: 1318

Answers (2)

Tom W
Tom W

Reputation: 21

I had the same problem recently and it was because Longhorn does not automatically reclaim blocks that are freed by your application, i.e. Prometheus. This causes the volume's size to grow indefinitely, beyond the configured size of the PVC. This is explained in the Longhorn Volume Actual Size documentation. You can trigger Longhorn to reclaim these blocks by using the Trim Filesystem feature, which should bring the size down to what you can see is used within the Container. You can set this up to run on a schedule as well to maintain it over time.

Late response, but hopefully it helps anyone else faced with the same issue in the future.

Upvotes: 2

JcGKitten
JcGKitten

Reputation: 3

Can the snapshots be the reason for the increasing size? As I understand it, longhorn takes snapshots and they are added to the total actual size used on the node, if data in the snapshot is different to the current data in the volume, which happens in your case because old metrics are deleted and new ones are received.

See this comment and this one.
Know I'm answering late but came across the same issues and maybe it helps someone.

Upvotes: 0

Related Questions