Reputation: 569
I am exploring Thanos, for the existing monitoring cluster. Thanos querier can perform deduplication but this is runtime behavior. When the shipper sends data to remote object storage, each Prometheus data is being shipped. When HA mode is used in Prometheus then duplicate data will be shipped by shipper, which no one wants to store duplicated data in storage. So my question is there any solution from Thanos to deduplicate data in remote object storage or any external instrumentation is needed in the cluster?
Upvotes: 0
Views: 3141
Reputation: 178
In Thanos architecture you must define some unique external_labels
(based on this doc).
Since the labels are different from each other in different Prometheuses, so different metrics will be stored in the object storage.
And by clarifying --query.replica-label=replica
on querier it will deduplicate metrics based on your label.
Upvotes: 0
Reputation: 713
this is a real issue with Thanos,
the deduplication logic is done on read not on write so at the moment there is no solution except using only one compactor but then you risk in missing data from the other promethues.
you can try looking at CORTEX which doe its deduplication on write. https://cortexmetrics.io/
Upvotes: 0