Reputation: 887
In a Spark streaming app, can I parallelize receivers on all machines in the cluster (and not just a Master)? That way, all nodes in the Spark cluster are reading from an external stream simultaneously.
Upvotes: 2
Views: 362
Reputation: 37435
When deployed on a cluster, Spark Streaming will use as many cores as specified in spark.cores.max
. We can programmatically create n
receivers and they will be spread over the cores reserved for this job, but there is no warranty of an even spread over physical nodes.
As an example using kafka, here we create kafkaParallelism
receivers.
@transient val inKafkaList:List[DStream[(K,V)]] = List.fill(kafkaParallelism) {
KafkaUtils.createStream[K, V, KDecoder, VDecoder](ssc, kafkaConfig, topics, StorageLevel.MEMORY_AND_DISK_SER)
}
@transient val inKafka = ssc.union(inKafkaList)
Note that it's a good practice to union the resulting Dstreams to reduce the number of tasks generated.
Upvotes: 1