Reputation: 7899
I have three different streams coming from different sources (Objects: Trade,MarketData, WeightAdj
, the only thing they have common is 'product'. Here is my streams.
Trade stream: tradeid, product, executions
MarketData stream: product, marketData
Computestream: product, factor
What I want to achieve using Flink
I want to join all three streams and produce latest value of Tuple3<Trade,MarketData,WeightAdj >
. Which means every time if any of these stream emit an event I should get latest of Tuple3<Trade,MarketData,WeightAdj>
I tried joining these streams using 'connect' function followed by keyBy
but it is not producing Enriched object if MarketData or WeightAdj events are emit.
public static void main(String[] args) throws Exception {
// some code
tradeStream.connect(marketStream)
.keyBy(
new KeySelector<Trade, String>() {
@Override
public String getKey(Trade trd) throws Exception {
return trd.product;
}
}, new KeySelector<MarketData, String>() {
@Override
public String getKey(MarketData marketData)
throws Exception {
return marketData.product;
}
}
)
.flatMap(new JoinRichCoFlatMapFunction())
.connect(weightStream)
.keyBy(new KeySelector<Tuple2<Trade, MarketData>, String>() {
@Override
public String getKey(Tuple2<Trade, MarketData> trd) throws Exception {
return trd.f0.product;
}
}, new KeySelector<WeightAdj, String>() {
@Override
public String getKey(WeightAdj wght) throws Exception {
return wght.product;
}
})
.flatMap(new TupleWeightJionRichCoFlatMapFunction())
.print();
}
public static final class JoinRichCoFlatMapFunction extends RichCoFlatMapFunction<Trade, MarketData, Tuple2<Trade, MarketData>>{
private ValueState<Trade> trades;
private ValueState<MarketData> marketData;
@Override
public void open(Configuration config) {
trades = getRuntimeContext().getState(new ValueStateDescriptor<>("Trades", Trade.class));
marketData = getRuntimeContext().getState(new ValueStateDescriptor<>("MarketData", MarketData.class));
}
@Override
public void flatMap1(Trade trd,Collector<Tuple2<Trade, MarketData>> out) throws Exception {
MarketData mktData = marketData.value();
if (mktData != null) {
marketData.clear();
out.collect(new Tuple2<Trade, MarketData>(trd, mktData));
} else {
trades.update(trd);;
}
}
@Override
public void flatMap2(MarketData mktData,Collector<Tuple2<Trade, MarketData>> out) throws Exception {
Trade trd = trades.value();
if (trd != null) {
trades.clear();
out.collect(new Tuple2<Trade, MarketData>(trd, mktData));
} else {
marketData.update(mktData);;
}
}
}
public static final class TupleWeightJionRichCoFlatMapFunction extends RichCoFlatMapFunction<Tuple2<Trade, MarketData>, WeightAdj, Tuple3<Trade, MarketData, WeightAdj>>{
private ValueState<Tuple2<Trade, MarketData>> tradeMarketState;
private ValueState<WeightAdj> weightState;
@Override
public void open(Configuration config) {
TypeInformation<Tuple2<Trade, MarketData>> info = TypeInformation.of(new TypeHint<Tuple2<Trade, MarketData>>(){});
tradeMarketState = getRuntimeContext().getState(new ValueStateDescriptor<>("Trades", info));
weightState = getRuntimeContext().getState(new ValueStateDescriptor<>("Weights", WeightAdj.class));
}
@Override
public void flatMap1(Tuple2<Trade, MarketData> trdWithMaktData, Collector<Tuple3<Trade, MarketData, WeightAdj>> out)
throws Exception {
WeightAdj weigt = weightState.value();
if (weigt != null) {
weightState.clear();
out.collect(new Tuple3<Trade, MarketData, WeightAdj>(trdWithMaktData.f0, trdWithMaktData.f1, weigt));
} else {
tradeMarketState.update(trdWithMaktData);;
}
}
@Override
public void flatMap2(WeightAdj weightData,Collector<Tuple3<Trade, MarketData, WeightAdj>> out) throws Exception {
Tuple2<Trade, MarketData> trdWithMktData = tradeMarketState.value();
if (trdWithMktData != null) {
tradeMarketState.clear();
out.collect(new Tuple3<Trade, MarketData, WeightAdj>(trdWithMktData.f0, trdWithMktData.f1, weightData));
} else {
weightState.update(weightData);;
}
}
}
Any idea what am I doing wrong?
Upvotes: 2
Views: 3519
Reputation: 43707
If I understand your goals correctly, there are a couple of points that need to be handled differently:
clear()
on any of the state, because you need to continue to remember the last value you've seen from each of the three streams.out.collect()
. If flatmap1
or flatmap2
is being called, that means that something has been updated, so there's something new to report out.(It looks like you are mimicking the logic used in the RidesAndFares exercise from the Flink training. In that exercise the requirements are different: in that case there is a pair of Ride and Fare events that need to be combined, on a one-time basis. After finding a Ride/Fare pair for a given rideId, the join is done for that rideId.)
And now a couple of caveats:
clear()
and if the product space is unbounded, then you'll be keeping an ever-increasing amount of state indefinitely. If this is an issue, you can use state TTL to arrange for stale state to be cleared.public void flatMap1(Trade trd, Collector<Tuple2<Trade, MarketData>> out) throws Exception {
trades.update(trd);;
MarketData mktData = marketData.value();
out.collect(new Tuple2<Trade, MarketData>(trd, mktData));
}
but when the application starts, this may produce a Tuple2
where mktData
is null. So it would be a good idea to protect against that.
As Arvid mentioned, the Table/SQL API makes these kinds of joins easy.
Upvotes: 3