peterschrott
peterschrott

Reputation: 592

Apache Flink: Enrich stream with data from external/blocking call

In my application, I want to enrich an infinite stream of events. The stream itself is parallelised by hashing of an Id. For every event, there might be a call to an external source (e.g. REST, DB). This call is blocking by nature. The order of events within one stream partition must be maintained.

My idea was to create a RichMapFunction, which sets up the connection and then polls the external source for each event. The blocking call usually takes not to long, but in the worst case, the service could be down.

Theoretically, this works, but I don't feel good doing it this way, as I don't know how Flink reacts if you have some blocking operations within the stream. And what happens if you have a lot of parallel streams blocking, i.e. am I running out of threads? Or how is the behavior stream-upwards at the point where the stream is parallelised?

Does someone else may have a similar issue and an answer to my question or some ideas how to tackle it?

Upvotes: 5

Views: 3000

Answers (1)

Eric Taix
Eric Taix

Reputation: 908

RichMapFunction is a good starting point but prefer RichAsyncFunctionwhich is asynchronous and which not block your processing !

Careful :
1- your DB access but also be asynchronous
2- your event order may change (according to the mode used)

More details : https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html

Hope it helps

Upvotes: 5

Related Questions