Vibhuti
Vibhuti

Reputation: 1634

Network Spark Streaming from multiple remote hosts

I wrote program for Spark Streaming in scala. In my program, i passed 'remote-host' and 'remote port' under socketTextStream.

And in the remote machine, i have one perl script who is calling system command:

echo 'data_str' | nc <remote_host> <9999>

In that way, my spark program is able to get data, but it seems little bit confusing as i have multiple remote machines which needs to send data to spark machine. I wanted to know the right way of doing it. Infact, how will i deal with data coming from multiple hosts?

For Reference, My current program:

def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("HBaseStream")
    val sc = new SparkContext(conf)

    val ssc = new StreamingContext(sc, Seconds(2))

    val inputStream = ssc.socketTextStream(<remote-host>, 9999)
    -------------------
    -------------------

    ssc.start()
    // Wait for the computation to terminate
    ssc.awaitTermination()

  }
}

Thanks in advance.

Upvotes: 1

Views: 384

Answers (1)

Shawn Guo
Shawn Guo

Reputation: 3228

You can find more information from "Level of Parallelism in Data Receiving".

Summary:

  • Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s);
  • These multiple DStreams can be unioned together to create a single DStream. Then the transformations that were being applied on a single input DStream can be applied on the unified stream.

Upvotes: 1

Related Questions