Piotr Gwiazda
Piotr Gwiazda

Reputation: 12222

How to "submit" an ad-hoc SQL to Beam on Flink

I'm using Apache Beam with Flink runner with Java SDK. It seems that deploying a job to Flink means building a 80-megabyte fat jar that gets uploaded to Flink job manager. Is there a way to easily deploy a lightweight SQL to run Beam SQL? Maybe have job deployed that can soemhow get and run ad hoc queries?

Upvotes: 1

Views: 491

Answers (1)

Anton
Anton

Reputation: 2539

I don't think it's possible at the moment, if I understand your question. Right now Beam SDK will always build a fat jar which will implement the pipeline and include all pipeline dependencies, and it will not be able to accept lightweight ad-hoc queries.

If you're interested in more interactive experience in general, you cat look at the ongoing efforts to make Beam more interactive, for example:

  • SQL shell: https://s.apache.org/beam-sql-packaging . This describes a work-in-progress Beam SQL shell, which should allow you to quickly execute small SQL queries locally in a REPL environment, so that you can interactively explore your data, and design the pipeline before submitting a long-running job. This does not change the way how the job gets submitted to Flink (or any other runner) though. So after you submitted the long running job, you will likely still have to use normal job management tools you currently have to control it.

  • Python: https://s.apache.org/interactive-beam . Describes the approach to wrap existing runner into an interactive wrapper.

Upvotes: 2

Related Questions