Reputation: 1538
I have a batch pipeline which pulls data from a cassandra table and writes into kafka. I would like to get various statistics based on cassandra data . For ex, total no.of records in the cassandra table, no.of records having null value for a column etc. I tried to leverage beam metrics. Though it is showing correct count in the google cloud console after the pipeline has completed execution, I am unable to get it in the main program after pipeline.run() method. It throws unsupported exception. I am using google data flow and bundles the pipeline as flex template. Is there anyway to get this work.
Upvotes: 0
Views: 585
Reputation: 5104
If you can get the job id, dataflow offers a public API that can be used to query metrics which is used internally . Easier might be to get these from Stackdriver, see, e.g. Collecting Application Metrics From Google cloud Dataflow
Upvotes: 1