Ranjit Iyer
Ranjit Iyer

Reputation: 887

Parallelize receivers across machines in Spark Streaming

In a Spark streaming app, can I parallelize receivers on all machines in the cluster (and not just a Master)? That way, all nodes in the Spark cluster are reading from an external stream simultaneously.

Upvotes: 2

Views: 362

Answers (1)

maasg
maasg

Reputation: 37435

When deployed on a cluster, Spark Streaming will use as many cores as specified in spark.cores.max. We can programmatically create n receivers and they will be spread over the cores reserved for this job, but there is no warranty of an even spread over physical nodes.

As an example using kafka, here we create kafkaParallelism receivers.

@transient val inKafkaList:List[DStream[(K,V)]] =     List.fill(kafkaParallelism) {
KafkaUtils.createStream[K, V, KDecoder, VDecoder](ssc, kafkaConfig,  topics, StorageLevel.MEMORY_AND_DISK_SER)
}
@transient val inKafka = ssc.union(inKafkaList)

Note that it's a good practice to union the resulting Dstreams to reduce the number of tasks generated.

Upvotes: 1

Related Questions