Joseph N
Joseph N

Reputation: 560

Error while using KafkaIO in apache beam DirectRunner

I am using apache beam DirectRunner to load data from kafka topic. My code is below:

conf={'bootstrap.servers':'localhost:9092'}

with beam.Pipeline() as pipeline:
        (pipeline
        |       ReadFromKafka(consumer_config=conf,topics=['topic1'])
        )

i am using below command to run this code:

python3 topic_to_gcs --runner DirectRunner

Getting below error:

File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'

Thanks in advance :)

Upvotes: 2

Views: 593

Answers (1)

Jan Lukavsky
Jan Lukavsky

Reputation: 131

Currently, Apache Beam uses so-called external transform to read from Kafka in Python SDK. It actually means, that your Python Pipeline will spawn a Java container and connect to Kafka from inside the container. It will then pass the data back to your Python Pipeline (more about this here).

If you can install docker on your host machine, where you run your pipeline (and on all other where you plan to run it, if you change your runner from DirectRunner to some distributed one), then that would be the best option to go.

Otherwise you can read about the current state in my answer here.

Upvotes: 2

Related Questions