squarespiral
squarespiral

Reputation: 267

python: Handling log output of module during program execution

I'm setting up a logger in my script like shown at the bottom. This works fine for my purposes and logs my __main__ log messages and those of any modules I use to stdout and a log file.

During program execution a module call that I'm using xarray.open_dataset(file, engine="cfgrib") raises an Error in some conditions and produces the following log output:

2023-02-18 10:02:06,731 cfgrib.dataset ERROR skipping variable: paramId==228029 shortName='i10fg'
Traceback (most recent call last):
...

How can I access this output during program execution?

The raised error in the cfgrib module is handled there gracefully and program execution can continue, but the logic of my program requires that I access the error message, in particular the part saying shortName='i10fg' in order to handle the error exhaustively.

Here is how my logger is set up:

def init_log():
    """initialize logging

    returns logger using log settings from the config file (settings.toml)
    """
    # all settings from a settings file with reasonable defaults
    lg.basicConfig(
        level=settings.logging.log_level,
        format=settings.logging.format,
        filemode=settings.logging.logfile_mode,
        filename=settings.logging.filename,
    )

    mylogger = lg.getLogger(__name__)
    stream = lg.StreamHandler()
    mylogger.addHandler(stream)

    clg.install(
        level=settings.logging.log_level,
        logger=mylogger,
        fmt="%(asctime)s %(levelname)s:\t%(message)s",
    )
    return mylogger

# main
log = init_log()
log.info('...reading files...')

I went through the python logging documentation and cookbook. While this contains ample examples on how to modify logging for various purposes, I could not find an example for accessing and reacting to a log message during program execution.

The Exception in my logs look this:

2023-02-20 12:22:37,209 cfgrib.dataset ERROR skipping variable: paramId==228029 shortName='i10fg'
Traceback (most recent call last):
  File "/home/foo/projects/windgrabber/.venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 660, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/foo/projects/windgrabber/.venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 591, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='time' value=Variable(dimensions=('time',), data=array([1640995200, 1640998800, 1641002400, ..., 1672520400, 1672524000,
       1672527600])) new_value=Variable(dimensions=('time',), data=array([1640973600, 1641016800, 1641060000, 1641103200, 1641146400,
       1641189600, 1641232800, 1641276000, 1641319200, 1641362400,

I cannot catch the Exception directly for some reason:

...
import sys
from cfgrib.dataset import DatasetBuildError
...

    try:
        df = xr.open_dataset(file, engine="cfgrib").to_dataframe()
        # triggering error manually like with the two lines below works as expected
        # raise Exception()
        # raise DatasetBuildError()
    except Exception as e:
        print('got an Exception')
        print(e)
        print(e.args)
    except BaseException as e:
        print('got a BaseException')
        print(e.args)
    except DatasetBuildError as e:
        print(e)
    except:
        print('got any and all exception')
        type, value, traceback = sys.exc_info()
        print(type)
        print(value)
        print(traceback)

Unless I uncomment the two lines where I raise the exception manually, the except clauses are never triggered, event though I can see the DatabaseBuildError in my logs.

Not sure if this has any bearing, but while I can see the Exception as quoted above in my file log, it is not printed to stdout.

EDIT:

As suggested by @dskrypa in the comments, passing errors='raise' to the function call indeed allows me to handle the exception. Unfortunately, that leaves me at square one: the call to xr.open_dataset() is very expensive (takes a long time). It returns a bunch of time series for variables. When the exception occurs the call returns whatever it can and I have to get the rest of the variables in a separate call.

Now when I raise the error and the call fails, it doesn't return any time series and I have to make 2 more calls to recover (one to get the first bunch of vars while ignoring the exception and a second one to get the rest).

Conversely when I don't raise the error (exception is logged but call finished partly = the original problem) I have to repeat every call in another try statement with the assumption that it might not have gotten all the vars/timeseries. Unfortunately this exception occurs in roughly half the cases.

What I'd need is a way to determine at runtime if there was an exception logged while still allowing the call to continue and deliver the first half of vars (or all of them if there was no exception). Only if there was exception log, I'd make another call to get the rest of the variables. This is basically the original problem statement again.

Looks like I might have to refactor this whole thing, convert grib to netcdf (reportedly faster) and then read from there. Hopefully, even if I run into the same exception, it will have less of an impact time-wise.

Upvotes: 0

Views: 166

Answers (0)

Related Questions