SQLAstro
SQLAstro

Reputation: 25

Issue with custom container for Delta Table in Azure Synapse

I currently have a pyspark code to copy parquet files in one container to another container and at same time create a delta table in the destination container. Both containers are set up in the same ADLS Gen2 Storage, even though I have set the destination container when i run the notebook Azure Synapse still creates the folder in the default location.

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("DeltaTableCreation").config("spark.jars.packages", "io.delta:delta-core_2.12:1.0.0").getOrCreate()

# Specify your Data Lake Storage account and containers
data_lake_account = "youraccount"
source_container = "source-container"
destination_container = "your-custom-destination-container"

# Specify paths for source and destination containers
source_container_path = f"abfss://{source_container}@{data_lake_account}.dfs.core.windows.net/path/to/source"
destination_container_path = f"abfss://{destination_container}@{data_lake_account}.dfs.core.windows.net/path/to/destination"

# Function to recursively discover Parquet files in nested folders
def discover_parquet_files(base_path):
    return spark.read.format("parquet").option("recursiveFileLookup", "true").load(base_path)

# Read data from multiple Parquet files in the source container
delta_table = discover_parquet_files(source_container_path)

# Save the Delta table to the destination container with an explicitly specified path
delta_table.write.format("delta").mode("overwrite").save(destination_container_path)

# Stop the Spark session
spark.stop()

Upvotes: 0

Views: 79

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11514

Azure Synapse still creates the folder in the default location.

That is how the delta tables were created in the ADLS gen2.

Delta tables will be created as parquet files in a folder. If your delta path is abfss://<container>@<storage_account_name>.dfs.core.windows.net/<delta_table_name>, then the last folder in this path is your delta table.

Check the below example.

destination_container_path="abfss://[email protected]/mydelta"
df.write.format("delta").mode("overwrite").save(destination_container_path)

Here. a delta table with the name of mydelta will be created in the targetdata container. This will have part parquet files and number of parquet files depends on the size of the data.

enter image description here

You can read this delta table with same path as below.

spark.read.format("delta").load("abfss://[email protected]/mydelta").show()

Sample output:

enter image description here

Go through this Reference to learn more about Delta tables.

Upvotes: 1

Related Questions