Reputation: 213
I have a parquet file and created a new External table, but the performance is very slow as compare to a normal table in the synapse. Can you please let me know how to over come this.
Upvotes: 2
Views: 5406
Reputation: 55
a) The first step is to partition your Parquet File using a relevant partition column, such as Year, Month, and Date.
b) I recommend using a View rather than an external table as a second recommendation. External Tables don't support Partition Pruning and won't use the partition columns to eliminate unnecessary files during the read.
c) Assure that data types are enforced, and that string types are being used appropriately.
d) If possible, convert your Parquet file to Delta format. Synapse is able to read Partition columns from Delta without the need for the filepath()
and filename()
functions. External tables do not support Delta, only views.
Note: External tables doesn't support Parquet partition columns.
SELECT *,
CAST(fct.filepath(1) AS SMALLINT) AS SalesOrderPathYear,
CAST(fct.filepath(2) AS TINYINT) AS SalesOrderPathMonth,
CAST(fct.filepath(3) AS DATE) AS SalesOrderPathDate
FROM
OPENROWSET
(
BULK 'conformed/facts/factsales/*/*/*/*.parquet',
DATA_SOURCE = 'ExternalDataSourceDataLake',
FORMAT = 'Parquet'
) AS fct
WITH
(
ColA as String(10),
ColB as Integer,
ColC as ...
)
Upvotes: 0
Reputation: 17401
Very broad question. So I'll give broad answer:
CREATE TABLE
. If you're querying data from one or more tables repeatedly and each query is different (group-by, join, selected columns) then you can't get beat performance of "normal" table with external tables.Use parquet format, which you're doing.
Pick right partition column and partition your data by storing partitions to different folders or file names.
If a query targets a single large file, you'll benefit from splitting it into multiple smaller files.
Try to keep your CSV (if using csv) file size between 100 MB and 10 GB.
Use correct data type.
Manually create statistics for CSV files
Use CETAS to enhance query performance and joins
...and many more.
Upvotes: 2