Azure Databricks Architecture - Communication between Control plane and data plane and authentications

I am trying to understand on Azure Databricks Architecture based on the this link. I could understand what is the purpose of control plane and data plane in Azure Databricks architecture.But I could't understand on the following questions .

Upvotes: 3

Views: 1009

Answers (2)

Phil Phil
Phil Phil

Reputation: 1

There might also be a difference between the serverless and the "traditional" data plane, the communication between the serverless data plane and the control plane will be via the cloud provider's backbone.

A good architecture overview is provided here: https://github.com/WowdyCloudy/wowdycloudy/blob/main/dbx/architecture.md#databricks-architecture

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87259

There are two ways of communication between control plane & data plane:

  1. Legacy - when VMs running on the data plane should have the public IPs, and control plane reaches them directly. This way was always a security headache. Azure still supports it & shows in the UI, but it shouldn't be used
  2. "No Public IP (NPIP)" or another name "Secure Cluster Connectivity" (doc and more technical details). In this case, when VMs in the data plane are starting, they are opening a bi-directional tunnel to a relay on the control plane, and it's always used for controlling VMs & Spark. In this setup, VMs don't need public IPs, and it's much more secure & easy to control.

Regarding authentication - it's internal detail, but it provides a way of ensuring that VMs that are communicating with control plane are really that VMs that form a cluster.

Upvotes: 1

Related Questions