jaysc
jaysc

Reputation: 75

When should you use a mount point in Azure Synapse Analytics?

The documentation of Azure Synapse Analytics mentions two ways read/write data to an Azure Data Lake Storage Gen2 using an Apache Spark pool in Synapse Analytics.

  1. Reading the files directly using the ADLS store path
adls_path = "abfss://<containername>@<accountname>.dfs.core.windows.net/<filepath>"

df = spark.read.format("csv").load(adls_path)

  1. Creating a mount point using mssparkutils and reading the files using the synfs path
mssparkutils.fs.mount( 
    "abfss://<containername>@<accountname>.dfs.core.windows.net", 
    "/data", 
    {"linkedService":"<accountname>"} 
) 

synfs_path = "synfs:/<jobid>/data/<filepath>"

df = spark.read.format("csv").load(synfs_path) 

What is the difference between the two methods? When should you prefer to use a mount point?

Upvotes: 1

Views: 1141

Answers (1)

shanmukh SS
shanmukh SS

Reputation: 28

Mount point is just like creating a virtual folder and mapping the location to Azure Storage

Pros of accessing Storage from a mount point:

  1. Less complex code while accessing specific files from Datalake, no need to specify full path of storage every time you access them
  2. You can access files like as they are in the local storage
  3. You can have your data organized as folders as a centralized location

Cons:

  1. Not much efficient when you need to access multiple directories from Azure Storage, mapping multiple directories confuses and makes a mess

Upvotes: 1

Related Questions