Reputation: 10469
Situation: multiple identical kafka datasources that get flatmapped into tuples for later union, reduction, saving, whatnot.
I need to know which original datasource each flatmapped packet came from to tag in the tuple. I'd rather not have a separate FlatMapFunction for each datasource as there may be hundreds.
Ideally I'd be able to pass some value into the flatmap function to add to the resultant tuple.
Possible? Some other way to achieve this?
Upvotes: 3
Views: 949
Reputation: 62350
As you have multiple source operators, you can simply configure the different sources via constructor arguments. As an alternative, you could also use broadcast variables: https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables
About union: it depends on your semantics you need. If you do it before the reduce
step, the partitions are built over all sources -- if you do it after the reduce, you get partitions per source. Thus, if two sources emit a tuple with the same key, they end up in different partitions. Doing the union before flatMap
disallows flatMap
to get chained with the source -- I would expect a performance penalty if chaining is prohibited.
Upvotes: 1