Reputation: 11023
Is there a way to find out what the driver IP is on a databricks cluster? The Ganglia UI shows all the nodes on the main page and there doesn't seem to be a way to filter only for the driver.
Upvotes: 3
Views: 16138
Reputation: 61
Are we talking about either the internal IP address (the one that the previous answers reply to) or the external IP address (the one that a 3rd party sees, if for example, we invoke an external API from our cluster)?
If we are talking about the second one, I have a humble notebook to illustrate it:
def get_external_ip(x):
import requests
import socket
hostname = socket.gethostname()
r = requests.get("https://api.ipify.org/")
public_IP = r.content
return(f"#{x} From {hostname} with publicIP {public_IP}.")
print('DRIVER:')
rdd1 = get_external_ip(0)
print(rdd1)
print('WORKERS:')
rdd2 = sc.parallelize(range(1, 4)).map(get_external_ip)
datacoll2 = rdd2.collect()
for row in datacoll2:
print(row)
It shows you the driver's external IP and the workers' external IPs (please adjust the range according to the workers' node number).
I hope it can be useful.
Upvotes: 6
Reputation: 461
You can go into the Spark cluster UI - Master
tab within the cluster. The url listed contains IP for the driver and the workers' IPs are listed at the bottom.
Depending on your use case, it may be helpful to know that in an init script you can get the DB_DRIVER_IP from an environment variable. https://docs.databricks.com/clusters/init-scripts.html#environment-variables
There are other environment variables set at runtime that can be accessed in a scala notebook:
System.getenv.get("MASTER") // spark://10.255.128.6:7077
System.getenv.get("SPARK_LOCAL_IP") // 10.255.128.6
Upvotes: 10