rshahrami
rshahrami

Reputation: 63

run apache beam on apache flink

I want to run a Python code using Apache beam on Apache Flink. The command that the apache beam site for launching Python code on Apache Flink is as follows:

docker run --net=host apachebeam/flink1.9_job_server:latest --flink-master=localhost:8081

The following is a discussion of different methods of executing code using Apache Fail on Apache Flink. But I haven't seen an example of launching it.

https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html

I want this code to run without Docker. How is this code commanded?

Upvotes: 0

Views: 1549

Answers (1)

Sam Bourne
Sam Bourne

Reputation: 650

You can spin up the flink job server directly using the beam source source. Note you'll need to install java.

1) Clone the beam source code:

git clone https://github.com/apache/beam.git

2) Start the job server

cd beam
./gradlew -p runners/flink/1.8/job-server runShadow -PflinkMasterUrl=localhost:8081

Some helpful tips:

This is not flink itself! You'll need to spin up flink separately.

The flink job service actually spins up a few services:

  • Expansion Service (port 8097) : This service allows you to use ExternalTransforms within your pipeline that exist within the java sdk. For example the transforms found within the python sdk apache_beam.io.external.* hit this expansion service.
  • Artifact Service (port 8098) : This is where the pipeline uploads your python artifacts (e.g. pickle files, etc) to be used by the flink taskmanager when it executes your python code. From what I recall you must share the artifact staging area (default to /tmp/beam-artifact-staging) between the flink taskworker and this artifact service.
  • Job Service (port 8099) : This is what you submit your pipeline to. It translates your pipeline into something for flink and submits it.

Upvotes: 1

Related Questions