Mayank Mathur
Mayank Mathur

Reputation: 1

Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception

As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.

While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.

df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)

After processing this file, we need to overwrite it and we use the following command.

df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')

What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."

Things I've noticed:

  1. Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.
  2. Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.

[Error returned by Synapse Notebook at write command.][1] [1]: https://i.sstatic.net/Obj9q.png

Upvotes: 0

Views: 1034

Answers (1)

Sairam Tadepalli
Sairam Tadepalli

Reputation: 1683

It's suggestable to perform mounting the data storage. Kindly refer the below documentation.

https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark

Upvotes: 0

Related Questions