Reputation: 475
I'm trying to figure out if a single task is ever run using all available cores on the executor? Ie, if a stage contains only one task, does that mean the task is a single threaded single core processing on the executor or could the task use all available cores in a multithreaded fashion "under the covers"?
I'm running ETL jobs in Azure Databricks on one worker (hence one executor) and at one point in the pipeline a single job creates a single stage that runs a single task to process the entire dataset. The task takes a few minutes to complete.
I want to understand if a single task can use all available executor cores running functions in parallell or not? In this case I deserialize json messages with the from_json function and save them as parquet files. I'm worried this is a single threaded process going on in the single task.
spark
.read
.table("input")
.withColumn("Payload", from_json($"Payload", schema))
.write
.mode(SaveMode.Append)
.saveAsTable("output")
Upvotes: 0
Views: 1343
Reputation: 6099
If you are looking the Spark UI
and see only one task, this is definitely single cored and single threaded.
For instance, if you do a join and then a trasnformation, you will see anything like 200
tasks by default. Which means 200
"thread" are computing in parallel.
If you want to check the number of executors, you can click on the stages
tab, click on any stage and you will see how many executors were used.
Upvotes: 1