Sanket Kelkar
Sanket Kelkar

Reputation: 169

Azure Synapse - Run the same notebook parallelly from 2 pipelines

I have the Azure Synapse Workspace and a small spark pool within it. I have written the code in such a way that the same spark notebook connecting to the same spark pool would be called multiple times based on the parameter that I pass from the Synapse pipeline.

Now the problem is two pipelines start at the same time but the notebook activity runs sequentially causing the second instance to be "queued" as shown below -

enter image description here

How can I make it parallel so my notebook starts at a time from different pipelines? More information -

Notebook code -

import logging
import findspark
findspark.init()
findspark.find()
from pyspark.sql import SparkSession
from data_mesh_etl import table1, table2

spark = SparkSession.builder \
    .appName("MyApp") \
    .config("spark.jars.packages", "com.microsoft.sqlserver:mssql-jdbc:9.4.1.jre11,org.apache.hadoop:hadoop-azure:3.3.1") \
    .getOrCreate()

spark.conf.set('spark.sql.caseSensitive', True)
spark.conf.set('spark.sql.debug.maxToStringFields', 3000)

logger = logging.getLogger()
logger.setLevel(logging.INFO)

if p_table_name == 'table1':
    table1.load_table1_data_into_sql(spark, logger)

if p_table_name == 'table2':
    table2.load_table2_data_into_sql(spark, logger)

I pass the parameter p_table_name from pipeline_table1 with the value as table1 and from pipeline_table2 with the value as table2 When these 2 pipelines start at a time, shouldn't my notebook also have 2 instances running in parallel? Is there any concurrency setting of Spark that I am missing here?

Can someone please help in this?

TIA!

Sanket Kelkar

Upvotes: 1

Views: 1907

Answers (1)

Sanket Kelkar
Sanket Kelkar

Reputation: 169

Answering my own question here - I got it working by simply increasing the size of the spark pool. See the attached screenshot where 4 spark job definitions were called at the same time, and they ran parallelly.

enter image description here

Note - In here I have tried the spark job definition but the same thing works on Notebooks as well.

Upvotes: 1

Related Questions