Milind Dhoke
Milind Dhoke

Reputation: 569

How deduplication can take place in remote object storage in Thanos ecosystem?

I am exploring Thanos, for the existing monitoring cluster. Thanos querier can perform deduplication but this is runtime behavior. When the shipper sends data to remote object storage, each Prometheus data is being shipped. When HA mode is used in Prometheus then duplicate data will be shipped by shipper, which no one wants to store duplicated data in storage. So my question is there any solution from Thanos to deduplicate data in remote object storage or any external instrumentation is needed in the cluster?

Upvotes: 0

Views: 3141

Answers (2)

Hedeesa
Hedeesa

Reputation: 178

In Thanos architecture you must define some unique external_labels (based on this doc).
Since the labels are different from each other in different Prometheuses, so different metrics will be stored in the object storage.
And by clarifying --query.replica-label=replica on querier it will deduplicate metrics based on your label.

Upvotes: 0

Raven
Raven

Reputation: 713

this is a real issue with Thanos,
the deduplication logic is done on read not on write so at the moment there is no solution except using only one compactor but then you risk in missing data from the other promethues.

you can try looking at CORTEX which doe its deduplication on write. https://cortexmetrics.io/

Upvotes: 0

Related Questions