SriramN
SriramN

Reputation: 501

databricks Structured Streaming: Where are the reliable receivers on streaming architecture

All, trying to understand the Databricks Structured Streaming architecture. Spark Streaming Official

Is this architecture diagram relevant for Structured Streaming as well?
If so here are my questions:
Q1: I see here the concept of reliable recievers.Where do these reliable recievers live? On the driver or worker. In otherwords, the reading to the source happens from the worker or driver?
Q2: As we see in the spark streaming official diagram, a reciever is a single machine that receives records. So if we have 20 partitions in EventHub Source, are we limited by the Driver's Core Restriction for the maximum concurrent reads? Otherwords, we can only perform concurrent reads to source not parallel?
Q3: Related to Q2, does this mean the parallelism in structured streaming can be achieved only for processing?
The below is my version of the architecture, please let me know if this needs any changes.

enter image description here
Thanks in advance.

Upvotes: 0

Views: 145

Answers (1)

SriramN
SriramN

Reputation: 501

As per my understanding from the spark streaming documentation
Answer for Q1 : The receivers live on the worker nodes
Answer for Q2 : Since the receivers run on workers, in case of a cluster, the driver's cores does not limit the receivers. Each receiver occupies a single core and gets allocated by a round-robin
Answer for Q3 : Read parallelism can be achieved by increasing the number of receivers/partitions on the source

These info is documented here


Please correct me if this is incorrect. Thanks.

Upvotes: 0

Related Questions