Reputation: 348
I am trying to estimate end to end tuple latency of my events using the latency metrics exported by Flink (I am using a Prometheus metrics reporter). All is good and I can see the latency metric in my Grafana/Prom dashboard. Looks something like
flink_taskmanager_job_latency_source_id_source_subtask_index_operator_id_operator_subtask_index_latency{
host="",instance="",job="",
job_id="",job_name="",operator_id="",operator_subtask_index="0",
quantile="0.99",source_id="",source_subtask_index="0",tm_id=""}
This test job I have is a simple source->map->sink
operation, with parallelism set to 1. I can see from the Flink dashboard that all them gets chained together into one task. For one run of my job, I see two sets of latency metrics. Each set shows all quantiles like (.5, .95..). Only thing different between the two sets is the operator_id
. I assumed this means one operator_id
belongs to the map
operator and the other belongs to the sink
.
Now my problem is that is no intuitive way to distinguish between the two (find out which operator_id is the map vs sink
), just by looking at the metrics. So my questions are essentially:
map
and sink
. Even though these names show up in other metrics like numRecordsIn
, the names does not show up in the latency metric.operator_id
and operator_name
?Upvotes: 3
Views: 1225
Reputation: 13346
The operator_id
is currently a hash value either computed from the hash values of the inputs and the node itself or if you have set a UID via uid
for an operator, it is computed as the murmur3_128
hash of this id.
Please open a JIRA issue to add this feature to Flink.
Upvotes: 2