Reputation: 592
In my application, I want to enrich an infinite stream of events. The stream itself is parallelised by hashing of an Id. For every event, there might be a call to an external source (e.g. REST, DB). This call is blocking by nature. The order of events within one stream partition must be maintained.
My idea was to create a RichMapFunction, which sets up the connection and then polls the external source for each event. The blocking call usually takes not to long, but in the worst case, the service could be down.
Theoretically, this works, but I don't feel good doing it this way, as I don't know how Flink reacts if you have some blocking operations within the stream. And what happens if you have a lot of parallel streams blocking, i.e. am I running out of threads? Or how is the behavior stream-upwards at the point where the stream is parallelised?
Does someone else may have a similar issue and an answer to my question or some ideas how to tackle it?
Upvotes: 5
Views: 3000
Reputation: 908
RichMapFunction
is a good starting point but prefer RichAsyncFunction
which is asynchronous and which not block your processing !
Careful :
1- your DB access but also be asynchronous
2- your event order may change (according to the mode used)
More details : https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html
Hope it helps
Upvotes: 5