Morpheus273
Morpheus273

Reputation: 47

Synapse Spark exception handling - Can't write to log file

I have written PySpark code to hit a REST API and extract the contents in an XML format and later wrote to Parquet in a data lake container.

I am trying to add logging functionality where I not only write out errors but updates of actions/process we execute.

I am comparatively new to Spark I have been relying on online articles and samples. All explain the error handling and logging through "1/0" examples and saving logs in the default folder structure (not in ADLS account/container/folder) which do not help at all. Most of the code written in Pure Python doesn't run as-is.

Could I get some assistance with setting up the following:

  1. Push errors to a log file under a designated folder sitting under a data lake storage account/container/folder hierarchy".
  2. Catching REST specific exceptions.

This is a sample of what I have written:

''''

LogFilepath = "abfss://[email protected]/Data/logging/data.log"

#LogFilepath2 = "adl://.azuredatalakestore.net/raw/Data/logging/data.log"

print(LogFilepath)

try:

1/0

except Exception as e:

    print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
    f.write("An error occured: {}\n".format(e))

''''

I have tried it both ABFSS and ADL file paths with no luck. The log file is already available in the storage account/container/folder.

Upvotes: 0

Views: 917

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11474

I have reproduced the above using abfss path in with open() function but it gave me the below error.

FileNotFoundError: [Errno 2] No such file or directory: 'abfss://[email protected]/datalogs.logs'

As per this Documentation

we can use open() on ADLS file with a path like /synfs/{jobId}/mountpoint/{filename}.

For that, first we need to mount the ADLS.

Here I have mounted it using ADLS linked service. you can mount either by Storage account access key or SAS as per your requirement.

mssparkutils.fs.mount(
"abfss://<container_name>@<storage_account_name>.dfs.core.windows.net",
"/mountpoint",
{"linkedService":"<ADLS linked service name>"}
)

Now use the below code to achieve your requirement.

from datetime import datetime
currentDateAndTime = datetime.now()

jobid=mssparkutils.env.getJobId()

LogFilepath='/synfs/'+jobid+'/synapsedata/datalogs.log'
print(LogFilepath)

try:
    1/0
except Exception as e:
    print('My Error...' + str(e))
    with open(LogFilepath, "a") as f:
        f.write("Time : {}- Error : {}\n".format(currentDateAndTime,e))

Here I am writing date time along with the error and there is no need to create the log file first. The above code will create and append the error.

If you want to generate the logs daily, you can generate date file names log files as per your requirement.

My Execution:

enter image description here

enter image description here

Here I have executed 2 times.

enter image description here

Upvotes: 0

Related Questions