I have an error in dataflow: Error processing pipeline

Question

when i run this code with DirectRunner works but when i run this code with DataFlowRunner i have this error:

this the error:

ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs//2023-04-05_02_00_47-7238238223888513941?project=
Traceback (most recent call last):
File "test.py", line 51, in 
run()
File "test.py", line 44, in run
output | 'Write' >> WriteToText("gs:///output/wc.txt")
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/pipeline.py", line 601, in __exit__
self.result.wait_until_finish()
File "/opt/py38/lib64/python3.8/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1555, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: 
Error processing pipeline.

in iam i have this rule:

Storage Object Administrator
BigqueryConnectionCustom Role
SQL Cloud Clients
BigQuery data editor
Dataflow developer
Service account user
BigQuery user
Compute Viewer
Worker Dataflow

the code is:

import os

import apache_beam as beam
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions


def pipelineOptions(pipeline_args):
    pipeline_options = PipelineOptions(
        pipeline_args,
        runner="DirectRunner",
        project=,
        job_name="testbigquery",
        temp_location=,
        region=
        )
    return pipeline_options


def run(argv=None):
    print("Start Process")
    pipeline_options = pipelineOptions(argv)
    pipeline = beam.Pipeline(options=pipeline_options)
    with pipeline as p:
        lines = p
        counts = (
                lines
                | 'Split' >> (beam.Create(["test", "fix", "test"]))
                | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
                | 'GroupAndSum' >> beam.CombinePerKey(sum))
        def format_result(word, count):
            return '%s: %d' % (word, count)
        output = counts | 'Format' >> beam.MapTuple(format_result)
        output | 'Write' >> WriteToText("gs:///output/wc.txt")

    print("End Process")


if __name__ == '__main__':
    run()

I have an error in dataflow: Error processing pipeline

Answers (1)

Related Questions