Fanooos
Fanooos

Reputation: 2828

How to join two (or more) streams (JavaDStream) in apache spark

We have a spark streaming application that consumes Gnip compliance stream.

In the old version of the API, the compliance stream was provided by one end point but now it is provided by 8 different endpoints.

We could run the same spark application 8 times with different parameters to consume different endpoints.

Is there a way in spark streaming to consume the 8 endpoints and merge them into one in the same application?

Should we use different streaming context for each connection or one context is enough?

Upvotes: 2

Views: 583

Answers (1)

Amit Kumar
Amit Kumar

Reputation: 2745

I think you are looking for Spark union here.

Read following for examples Concatenating datasets of different RDDs in Apache spark using scala

As per Spark documentation Spark union :

Return a new dataset that contains the union of the elements in the source dataset and the argument.

Upvotes: 1

Related Questions