ethrbunny
ethrbunny

Reputation: 10469

flink - inject values into flatmap

Situation: multiple identical kafka datasources that get flatmapped into tuples for later union, reduction, saving, whatnot.

I need to know which original datasource each flatmapped packet came from to tag in the tuple. I'd rather not have a separate FlatMapFunction for each datasource as there may be hundreds.

Ideally I'd be able to pass some value into the flatmap function to add to the resultant tuple.

Possible? Some other way to achieve this?

Upvotes: 3

Views: 949

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62350

As you have multiple source operators, you can simply configure the different sources via constructor arguments. As an alternative, you could also use broadcast variables: https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables

About union: it depends on your semantics you need. If you do it before the reduce step, the partitions are built over all sources -- if you do it after the reduce, you get partitions per source. Thus, if two sources emit a tuple with the same key, they end up in different partitions. Doing the union before flatMap disallows flatMap to get chained with the source -- I would expect a performance penalty if chaining is prohibited.

Upvotes: 1

Related Questions