Reputation: 1099
Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster.
I was trying to use below snippet in my application but no luck.
val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")
Is there any way to get those values using SparkContext
Object or SparkConf
object etc..
Upvotes: 8
Views: 17345
Reputation: 865
This is an old question, but this is my code for figuring this out on Spark 2.3.0:
+ 414 executor_count = len(spark.sparkContext._jsc.sc().statusTracker().getExecutorInfos()) - 1
+ 415 cores_per_executor = int(spark.sparkContext.getConf().get('spark.executor.cores','1'))
Upvotes: 8
Reputation: 29155
getExecutorStorageStatus
and getExecutorMemoryStatus
both return the number of executors including driver.
like below example snippet.
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
sc.getConf.getInt("spark.executor.instances", 1)
similarly get all properties and print like below you may get cores information as well..
sc.getConf.getAll.mkString("\n")
OR
sc.getConf.toDebugString
Mostly spark.executor.cores
for executors spark.driver.cores
driver should have this value.
EDIT But can be accessed using Py4J bindings exposed from SparkSession.
sc._jsc.sc().getExecutorMemoryStatus()
Upvotes: 6
Reputation: 15
This is python Example to get number of cores (including master's)
def workername():
import socket
return str(socket.gethostname())
anrdd=sc.parallelize(['',''])
namesRDD = anrdd.flatMap(lambda e: (1,workername()))
namesRDD.count()
Upvotes: -3