Reputation: 1250
I am confused of the definitions. In documentation it seems that join
is followed by a key
defined, but connect
does not need to specify key
and the result of which is a connectedStream
. What can we do with this conenctedStream
and is there any concrete example that we use one rather than the other?
More, how is the connected stream
looks like?
Thanks in advance
Upvotes: 7
Views: 4562
Reputation: 2664
A connect
operation is more general then a join operation. Connect ensures that two streams (keyed or unkeyed) meet at the same location (at the same parallel instance within a CoXXXFunction
).
One stream could be a control stream that manipulates the behavior applied to the other stream. For example, you could stream-in new machine learning models or other business rules.
Alternatively, you can use the property of two streams that are keyed and meet at the same location for joining. Flink provides some predefined join operators.
However, joining of data streams often depends on different use case-specific behaviors such as "How long do you want to wait for the other key to arrive?", "Do you only look for one matching pair or more?", or "Are there late elements that need special treatment if no matching record arrives or the other matching record is not stored in state anymore?". A connect()
allows you to implement your own joining logic if needed. The data Artisans training here explains one example of connect for joining.
Upvotes: 10