Achaius
Achaius

Reputation: 6124

How to receive root cause for Pipeline Dataflow job failure

I am running my pipeline in Dataflow. I want to collect all error messages from Dataflow job using its id. I am using Apache-beam 2.3.0 and Java 8.

DataflowPipelineJob dataflowPipelineJob = ((DataflowPipelineJob) entry.getValue());
String jobId = dataflowPipelineJob.getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);

Is there any way to receive only error message from pipeline?

Upvotes: 0

Views: 706

Answers (1)

Scott Wegner
Scott Wegner

Reputation: 7493

Programmatic support for reading Dataflow log messages is not very mature, but there are a couple options:

  1. Since you already have the DataflowPipelineJob instance, you could use the waitUntilFinish() overload which accepts a JobMessagesHandler parameter to filter and capture error messages. You can see how DataflowPipelineJob uses this in its own waitUntilFinish() implementation.

  2. Alternatively, you can query job logs using the Dataflow REST API: projects.jobs.messages/list. The API takes in a minimumImportance parameter which would allow you to query just for errors.

Note that in both cases, there may be error messages which are not fatal and don't directly cause job failure.

Upvotes: 1

Related Questions