Morpheus273
Morpheus273

Reputation: 47

Synapse Spark: Python logging to log file in Azure Data Lake Storage

I am working in Synapse Spark and building a logger function to handle error logging. I intend to push the logs to an existing log file (data.log) located in AzureDataLakeStorageAccount/Container/Folder/.

In addition to the root logger I have added a StreamHandler and trying to setup a FileHandler to manage the log file write-out.

The log file path I am specifying has path in this format: 'abfss:/[email protected]/Data/logging/data.log'

When I run the below code, I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_<number/container_number/abfss:/[email protected]/Data/logging/data.log'

The default mount path is getting prefixed to the ADLS file path.

Here is the code:

'''

import logging

def init_logger(name: str, logging_level: int = logging.DEBUG) -> logging.Logger:    
    _log_format = "%(levelname)s %(asctime)s %(name)s: %(message)s"
    _date_format = "%Y-%m-%d %I:%M:%S %p %z"
    _formatter = logging.Formatter(fmt=_log_format, datefmt=_date_format)

    _root_logger = logging.getLogger()
    _logger = logging.getLogger(name)
    _logger.setLevel(logging_level)

    #Root and Stream Handler 
    if _root_logger.handlers:
        for handler in _root_logger.handlers:
            handler.setFormatter(_formatter)

        _logger.setLevel(logging_level)
    else:
        _handler = logging.StreamHandler(sys.stderr)
        _handler.setLevel(logging_level)
        _handler.setFormatter(_formatter)
        _logger.addHandler(_handler)

    __handler = logging.FileHandler(LogFilepath, 'a')
    __handler.setLevel(logging_level)
    __handler.setFormatter(_formatter)
    _logger.addHandler(__handler)

    return _logger

'''

To address the mount path prefix I added a series of '../' to move level up but even with this I end up with a solitary '/' prefixed to my ADLS path.

I have not found any online assistance or article where this has been implemented in an Azure Data Lake setup. Any assistance will be appreciated.

Upvotes: 1

Views: 1128

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11514

Looking at the error, it seems to be the path of the file might be the issue here. Instead of giving the abfss path, you can try by giving the path like /synfs/{jobid}/<mount point>/filepath.

Use the below code for that.

jobid=mssparkutils.env.getJobId()
LogFilepath='/synfs/'+jobid+'/<mount_point>/filename.log'
print(LogFilepath)

enter image description here

The above path might work for you. If not, you can try this method with open() as an alternative for logging to an ADLS file.

Upvotes: 0

Related Questions