Reputation: 560
I am using apache beam DirectRunner to load data from kafka topic. My code is below:
conf={'bootstrap.servers':'localhost:9092'}
with beam.Pipeline() as pipeline:
(pipeline
| ReadFromKafka(consumer_config=conf,topics=['topic1'])
)
i am using below command to run this code:
python3 topic_to_gcs --runner DirectRunner
Getting below error:
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'
Thanks in advance :)
Upvotes: 2
Views: 593
Reputation: 131
Currently, Apache Beam uses so-called external transform to read from Kafka in Python SDK. It actually means, that your Python Pipeline will spawn a Java container and connect to Kafka from inside the container. It will then pass the data back to your Python Pipeline (more about this here).
If you can install docker on your host machine, where you run your pipeline (and on all other where you plan to run it, if you change your runner from DirectRunner to some distributed one), then that would be the best option to go.
Otherwise you can read about the current state in my answer here.
Upvotes: 2