Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception

Question

As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.

While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.

df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\").load(csvFilePath)

After processing this file, we need to overwrite it and we use the following command.

df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')

What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."

Things I've noticed:

Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.
Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.

[Error returned by Synapse Notebook at write command.][1] [1]: https://i.sstatic.net/Obj9q.png

Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception

Answers (1)

Related Questions