Azure Synapse - Run the same notebook parallelly from 2 pipelines

Question

I have the Azure Synapse Workspace and a small spark pool within it. I have written the code in such a way that the same spark notebook connecting to the same spark pool would be called multiple times based on the parameter that I pass from the Synapse pipeline.

Now the problem is two pipelines start at the same time but the notebook activity runs sequentially causing the second instance to be "queued" as shown below -

How can I make it parallel so my notebook starts at a time from different pipelines? More information -

Notebook code -

import logging
import findspark
findspark.init()
findspark.find()
from pyspark.sql import SparkSession
from data_mesh_etl import table1, table2

spark = SparkSession.builder \
    .appName("MyApp") \
    .config("spark.jars.packages", "com.microsoft.sqlserver:mssql-jdbc:9.4.1.jre11,org.apache.hadoop:hadoop-azure:3.3.1") \
    .getOrCreate()

spark.conf.set('spark.sql.caseSensitive', True)
spark.conf.set('spark.sql.debug.maxToStringFields', 3000)

logger = logging.getLogger()
logger.setLevel(logging.INFO)

if p_table_name == 'table1':
    table1.load_table1_data_into_sql(spark, logger)

if p_table_name == 'table2':
    table2.load_table2_data_into_sql(spark, logger)

I pass the parameter p_table_name from pipeline_table1 with the value as table1 and from pipeline_table2 with the value as table2 When these 2 pipelines start at a time, shouldn't my notebook also have 2 instances running in parallel? Is there any concurrency setting of Spark that I am missing here?

Can someone please help in this?

TIA!

Sanket Kelkar

Azure Synapse - Run the same notebook parallelly from 2 pipelines

Answers (1)

Related Questions