Alex
Alex

Reputation: 359

Using Python Processors in Java Flink Application

I have a use case where I want to implement an AWS Kinesis Data Application with Flink in Java. It will listen to multiple Kinesis streams via the Data Streams API. However, the analysis of those streams will be done in Python (since our data scientists prefer Python).

From this answer, there appears to be support for calling Python UDFs from Java. However, I want to be able to convert an incoming stream to a table, via

StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
Table sessionsTable = tableEnv.fromDataStream(inputStream);

...and then have a Python processor that is invoked to process that stream.

I really have 3 questions here:

  1. Is this a supported use case?
  2. If so, is there documentation that describes how to do so?
  3. If so, will this add significant overhead to the application?

Upvotes: 1

Views: 519

Answers (1)

David Anderson
David Anderson

Reputation: 43707

The starting point in the Flink documentation for learning about using Python with Tables and Datastreams is at https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/python/overview/.

The Python APIs only provide a subset of what's available from Java; you'll have to look and see if what you need is included.

Not sure about performance, but you can, for example, convert back and forth between Flink Tables and Pandas dataframes.

Upvotes: 0

Related Questions