Reputation: 704
I am looking at running spark batch jobs on azure synapse. I am currently able to test the runs by using az cli for synapse. On the production, I need to trigger these spark submissions via external application (prefect flows).
To submit the spark job, I am looking at using azure synapse sdk
(https://learn.microsoft.com/en-us/python/api/azure-mgmt-synapse/azure.mgmt.synapse.synapsemanagementclient?view=azure-python) with
How do I pass the TokenCredential
as mentioned here https://learn.microsoft.com/en-us/python/api/azure-mgmt-synapse/azure.mgmt.synapse.synapsemanagementclient?view=azure-python#constructor
Upvotes: 2
Views: 1394
Reputation: 704
This was rather straightforward and I had overlooked the documentation.
The short code is below
from azure.identity import ClientSecretCredential
from azure.synapse.spark import SparkClient
from azure.synapse.spark.models import SparkBatchJobOptions
def run(self, job_name: str, job_args):
credential = ClientSecretCredential(self.tenant_id, self.client_id,self.client_secret)
options = SparkBatchJobOptions.from_dict({
"tags": None,
"artifactId": None,
"name": f"{job_name}",
"file": f"{job_name}.py",
"className": None,
"args": job_args,
"jars": [],
"files": [],
"archives": [],
"conf": None,
"driverMemory": "4g",
"driverCores": 4,
"executorMemory": "2g",
"executorCores": 2,
"numExecutors": 2,
}
})
job = spark_client.spark_batch.create_spark_batch_job(options, detailed=False)
Upvotes: 1