Reputation: 47
I have written PySpark code to hit a REST API and extract the contents in an XML format and later wrote to Parquet in a data lake container.
I am trying to add logging functionality where I not only write out errors but updates of actions/process we execute.
I am comparatively new to Spark I have been relying on online articles and samples. All explain the error handling and logging through "1/0" examples and saving logs in the default folder structure (not in ADLS account/container/folder) which do not help at all. Most of the code written in Pure Python doesn't run as-is.
Could I get some assistance with setting up the following:
This is a sample of what I have written:
''''
LogFilepath = "abfss://[email protected]/Data/logging/data.log"
#LogFilepath2 = "adl://.azuredatalakestore.net/raw/Data/logging/data.log"
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("An error occured: {}\n".format(e))
''''
I have tried it both ABFSS and ADL file paths with no luck. The log file is already available in the storage account/container/folder.
Upvotes: 0
Views: 917
Reputation: 11474
I have reproduced the above using abfss
path in with open()
function but it gave me the below error.
FileNotFoundError: [Errno 2] No such file or directory: 'abfss://[email protected]/datalogs.logs'
As per this Documentation
we can use
open()
on ADLS file with a path like/synfs/{jobId}/mountpoint/{filename}
.
For that, first we need to mount the ADLS.
Here I have mounted it using ADLS linked service. you can mount either by Storage account access key or SAS as per your requirement.
mssparkutils.fs.mount(
"abfss://<container_name>@<storage_account_name>.dfs.core.windows.net",
"/mountpoint",
{"linkedService":"<ADLS linked service name>"}
)
Now use the below code to achieve your requirement.
from datetime import datetime
currentDateAndTime = datetime.now()
jobid=mssparkutils.env.getJobId()
LogFilepath='/synfs/'+jobid+'/synapsedata/datalogs.log'
print(LogFilepath)
try:
1/0
except Exception as e:
print('My Error...' + str(e))
with open(LogFilepath, "a") as f:
f.write("Time : {}- Error : {}\n".format(currentDateAndTime,e))
Here I am writing date time along with the error and there is no need to create the log file first. The above code will create and append the error.
If you want to generate the logs daily, you can generate date file names log files as per your requirement.
My Execution:
Here I have executed 2 times.
Upvotes: 0