MJeremy
MJeremy

Reputation: 1250

What is the difference between Flink join and connect?

I am confused of the definitions. In documentation it seems that join is followed by a key defined, but connect does not need to specify key and the result of which is a connectedStream. What can we do with this conenctedStream and is there any concrete example that we use one rather than the other?

More, how is the connected stream looks like?

Thanks in advance

Upvotes: 7

Views: 4562

Answers (1)

twalthr
twalthr

Reputation: 2664

A connect operation is more general then a join operation. Connect ensures that two streams (keyed or unkeyed) meet at the same location (at the same parallel instance within a CoXXXFunction).

One stream could be a control stream that manipulates the behavior applied to the other stream. For example, you could stream-in new machine learning models or other business rules.

Alternatively, you can use the property of two streams that are keyed and meet at the same location for joining. Flink provides some predefined join operators.

However, joining of data streams often depends on different use case-specific behaviors such as "How long do you want to wait for the other key to arrive?", "Do you only look for one matching pair or more?", or "Are there late elements that need special treatment if no matching record arrives or the other matching record is not stored in state anymore?". A connect() allows you to implement your own joining logic if needed. The data Artisans training here explains one example of connect for joining.

Upvotes: 10

Related Questions