navige
navige

Reputation: 2517

Spark 2: check if node is driver or worker

I have a Spark 2 application that uses grpc so that client applications can connect to it.

However, I want the grpc code only to be started on the driver node and not on the workers.

Is there a possibility in Spark 2 to check if the node the code is currently running on is the driver node?

Upvotes: 0

Views: 2581

Answers (4)

Ben Taylor
Ben Taylor

Reputation: 515

Having just tried it, the other thing that happens on the executor is the process ID changes. So you can use BiS's idea of using an environment variable, but instead of setting one of those, capture the pid when the process starts on the driver. Then whenever you want to know if you're on an executor, you should be able to compare the current pid to the captured one - if they're different, you're on an executor.

from os import getpid

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

print(f"Driver pid: {getpid()}")

@udf("long")
def udf_pid(s):
  return (getpid())
  
d = [{'name': 'Alice', 'age': 3}, {'name': 'Bob', 'age': 5}]
df = spark.createDataFrame(d)
output = df.select("age", squared_udf("age").alias("worker_pid"))    
output.show()

Output

Driver pid: 5890
+---+----------+
|age|worker_pid|
+---+----------+
|  3|      6026|
|  5|      6057|
+---+----------+

(Excuse weird test data, adapted from an example in the databricks docs)

Upvotes: 0

GraceMeng
GraceMeng

Reputation: 1099

We can tell a node is the driver when TaskContext.get() is None normally, unless TaskContext is created explicitly in the driver for testing purposes.

Upvotes: 0

Assaf Mendelson
Assaf Mendelson

Reputation: 13001

You can get the driver hostname by:

sc.getConf.get("spark.driver.host")

Upvotes: 1

BiS
BiS

Reputation: 503

I don't like the way to do it using "hosts", you depend on matching the right interface and also, same node can contain both drivers and masters. Personally, I set a environment variable

spark.executorEnv.RUNNING_ON_EXECUTOR=yes

and then in my code (using Python here, but it should work in any other language):

import os
if "RUNNING_ON_EXECUTOR" in os.environ:
       //Run executor code
else:
       //run driver code

Upvotes: 5

Related Questions