Ajay
Ajay

Reputation: 267

Performance issue in Synapse serverless SQL pool while reading CSV stored in ADLS

I have enabled the Export to data lake feature in F&O D365 and created external table in Serverless SQL pool database in Synapse to read the CSV. It's working fine since 6 month however now I am facing performance issue due huge amount of data and we are making join with multiple tables(approx. 10 tables) which has millions of data. To get the result it's taking around 30 seconds or sometime 40 seconds it's vary.

I am using logic app and function app to invoke the SQL queries.

My understanding was if it's Synapse serverless SQL Pool then automatically it'll handle the load and I will get result with 3-4 seconds but it's taking 40 seconds.

I checked all the best practices but it doesn't worked.

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool

Upvotes: 2

Views: 3517

Answers (2)

Ajay
Ajay

Reputation: 267

Finally, I moved the ADLS CSV data into Dedicated SQL Pool database using Synapse pipeline and data flow as we have complex join and above suggested approach did not worked. In dedicated SQL pool, we are able to increase the performance level as well as we can implement the materialized view, statistics and indexes which helped me to meet the API performance expectation.

In serverless, I was able to enhance some level performance if we have Parquet format file instead of CSV. So better I thought to move the data into dedicated SQL.

Thanks everyone for suggestion and time.

Upvotes: 2

Utkarsh Pal
Utkarsh Pal

Reputation: 4544

The approach needs to be changed. You don't need to use additional services like Logic App and Functions, instead you can directly use Synapse Notebook for this.

You an use any of the approach you read data from Azure Data Lake Storage.

  1. OPENROWSET

The OPENROWSET(BULK...) function allows you to access files in Azure Storage. OPENROWSET function reads content of a remote data source (for example file) and returns the content as a set of rows. Within the serverless SQL pool resource, the OPENROWSET bulk rowset provider is accessed by calling the OPENROWSET function and specifying the BULK option.

Refer below example.

--OPENROWSET syntax for reading delimited text files
OPENROWSET  
( { BULK 'unstructured_data_path' , [DATA_SOURCE = <data source name>, ] 
    FORMAT = 'CSV'
    [ <bulk_options> ]
    [ , <reject_options> ] }  
)  
WITH ( {'column_name' 'column_type' [ 'column_ordinal' | 'json_path'] })  
[AS] table_alias(column_alias,...n)
  1. Read data as External Table

An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. External tables are used to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.

Refer: External tables in dedicated SQL pool and serverless SQL pool

Upvotes: 0

Related Questions