Raúl García
Raúl García

Reputation: 344

Multithreading inside Flink's Map/Process function

I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.

Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.

I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.

Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?

Upvotes: 1

Views: 466

Answers (1)

David Anderson
David Anderson

Reputation: 43409

Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.


I fear you'll get into trouble if you try this with a multi-threaded map/process function.

How about this instead:

You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.

Upvotes: 1

Related Questions