Lastik
Lastik

Reputation: 991

Merging two streams in Spark Streaming

Could you push me into right direction by the following question? (Even link to the documentation containing the required info would be appreciated.)

Is there any ability to merge multiple streams of data into stream of tuples.

E.g. we have stream A with elements (A1, t1), (A2, t2), ...(An, tn) and stream B with elements (B1, t1'), (B2, t2'), ... , (Bn, tn').

Where t is time of value (values are time series actually).

I would like to receive stream C with values

(A1", B1", t1"), ...,(An", Bn", tn")

Time from streams A and B could differ (that's why I am using ' and "). Metrics could be consumed in different time and by different rate. In such case, value with the latest to required time stamp must be taken while merging streams.

Upvotes: 6

Views: 3376

Answers (1)

Laeeq
Laeeq

Reputation: 365

You can use DStream.join. When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.

Upvotes: 5

Related Questions