sunitha
sunitha

Reputation: 1538

Apache beam on google dataflow: Collecting metrics from within the main method

I have a batch pipeline which pulls data from a cassandra table and writes into kafka. I would like to get various statistics based on cassandra data . For ex, total no.of records in the cassandra table, no.of records having null value for a column etc. I tried to leverage beam metrics. Though it is showing correct count in the google cloud console after the pipeline has completed execution, I am unable to get it in the main program after pipeline.run() method. It throws unsupported exception. I am using google data flow and bundles the pipeline as flex template. Is there anyway to get this work.

Upvotes: 0

Views: 585

Answers (1)

robertwb
robertwb

Reputation: 5104

If you can get the job id, dataflow offers a public API that can be used to query metrics which is used internally . Easier might be to get these from Stackdriver, see, e.g. Collecting Application Metrics From Google cloud Dataflow

Upvotes: 1

Related Questions