Reputation: 61
I am trying to programmatically find out the status of a dataflow job while its running. As in, I would like to poll the batch/ streaming job, and once it has completed, would like to trigger the next job. Is there a Java/Scala API that I can make use of to accomplish this. I have already tried using com.google.api.services.dataflow.Dataflow$Projects$Locations$Jobs
and is able to retrieve the JobMetrics, but I am trying to get more information like the jobs input & output metrics to determine if the streaming job has processed all data or not.
[EDITED - Based on follow-up question]. These are the signals I am trying to get from the dataflow jobs while they are running:
For Batch Jobs: Only if the job is running
, completed
or failed
. Based on these, I can decide whether to wait, trigger next job or not proceed with next job respectively.
For Streaming Jobs: As I want to decide whether the job has processed all necessary data, maybe I can try to get the following information:
1. Data Freshness
2. No. of Elements ingested/ read (Input Metrics)
3. Elements written (Output Metrics)
While checking manually the streaming jobs, we conclude that the job is "done" when we see that the throughput has come down to zero and hasn't changed for a few minutes. Would like to do the same while automating this approach.
Is there anything available that can help me?
Upvotes: 1
Views: 1152
Reputation: 7287
Apologies I'm Java illiterate but with regards to the details that you need I can point you to which endpoint in Dataflow API to get them. My examples are done by sending HTTP requests to the Dataflow API using curl.
Job Status (running, completed, failed)
currentState
. See currentState to view all states available for the job. This includes JOB_STATE_DONE,JOB_STATE_RUNNING, JOB_STATE_FAILED, etc.Elements ingested / written
ReadFromPubSub
step, you can find the the number of messages that it read. See screenshot below on what the object looks like and what it looks like on the UI.scalar
contains the value of "Elements Added" values are not the same because it is continuously streaming):
Data freshness
I hope this information can point you to the right direction and will be able to implement it using Java.
Upvotes: 1