camh
camh

Reputation: 42478

Pythonic error handling of complex functions

I'd like to know if there is a Pythonic way for handling errors in long-running functions that can have errors in part that do not affect the ability of the function to continue.

As an example, consider a function that given a list of URLs, it recursively retrieves the resource and all linked resources under the path of the top level URLs. It stores the retrieved resources in a local filesystem with a directory structure mirroring the URL structure. Essentially this is a basic recursive wget for a list of pages.

There are quite a number of points where this function could fail:

A failure on retrieving or saving any one resource only affects the function's ability to continue to process that resource and any child resources that may be linked from it, but it is possible to continue to retrieve other resources.

A simple model of error handling is that on the first error, an appropriate exception is raised for the caller to handle. The problem with this is that it terminates the function and does not allow it to continue. The error could possibly be fixed and the function restarted from the beginning but this would cause work to be redone, and any permanent errors may mean we never complete.

A couple of alternatives I have in mind are:

In Python discussions, I've often noted certain approaches described as Pythonic or non-Pythonic. I'd like to know if there are any particularly Pythonic approaches to handling the type of scenario described above.

Does Python have any batteries included that model more sophisticated error handling than the terminate model of exception handling, or do the more complex batteries included use a model of error handling that I should copy to stay Pythonic?

Note: Please do not focus on the example. I'm not looking to solve problems in that particular space, but it seemed like a good example that most people here would have an understanding of.

Upvotes: 6

Views: 928

Answers (2)

ncoghlan
ncoghlan

Reputation: 41496

I don't think there's a particularly clear "Pythonic/non-Pythonic" distinction at the level you're talking about here.

One of the big reasons there's no "one-size-fits-all" solution in this domain, is that the exact semantics you want are going to be problem specific.

  • For one situation, abort-on-first-failure may be adequate.
  • For another, you may want abort-and-rollback if any of the operations fails.
  • For a third, you may want to complete as many as possible and simply log-and-ignore failures
  • For a fourth alternative, you may want to complete as many as possible, but raise an exception at the end to report any that failed.

Even supporting an error handler doesn't necessarily cover all of those desired behaviours - a simple per-failure error handler can't easily provide abort-and-rollback semantics, or generate a single exception at the end. (It's not impossible - you just have to mess around with tricks like passing bound methods or closures as your error handlers)

So the best you can do is take an educated guess at typical usage scenarios and desirable behaviours in the face of errors, and design your API accordingly.

A fully general solution would accept an on-error handler that is given each failure as it happens, and a final "errors occurred" handler that gives the caller a chance to decide how multiple errors are handled (with some protocol to allow data to be passed from the individual error handlers to the final batch error handler).

However, providing such a general solution is likely to be an API design failure. The designer of the API shouldn't be afraid to have an opinion on how their API should be used, and how errors should be handled. The main thing to keep in mind is to not overengineer your solution:

  • if the naive approach is adequate, don't mess with it
  • if collecting failures in a list and reporting a single error is good enough, do that
  • if you need to rollback everything if one part fails, then just implement it that way
  • if there's a genuine use case for custom error handling, then accept an error handler as a part of the API. But have a specific use case in mind when you do this, don't just do it for the sake of it. And when you do, have a sensible default handler that is used if the user doesn't specify one (this may just be the naive "raise immediately" approach)
  • If you do offer selectable error handlers, consider offering some standard error handlers that can be passed in either as callables or as named strings (i.e. along the lines of the error handler selection for text codecs)

Perhaps the best you're going to get as a general principle is that "Pythonic" error handling will be as simple as possible, but no simpler. But at that point, the word is just being used as a synonym for "good code", which isn't really its intent.

On the other hand, it is slightly easier to talk about what actual forms non-Pythonic error handling might take:

def myFunction(an_arg, error_handler)
  # Do stuff
  if err_occurred:
    if isinstance(err, RuntimeError):
      error_handler.handleRuntimeError()
    elif  isinstance(err, IOError):
      error_handler.handleIOError()

The Pythonic idiom is that error handlers, if supported at all, are just simple callables. Give them the information they need to decide how to handle the situation, rather than try to decide too much on their behalf. If you want to make it easier to implement common aspects of the error handling, then provide a separate helper class with a __call__ method that does the dispatch, so people can decide whether or not they want to use it (or how much they want to override when they do use it). This isn't completely Python-specific, but it is something that folks coming from languages that make it annoyingly difficult to pass arbitrary callables around (such as Java, C, C++) may get wrong. So complex error handling protocols would definitely be a way to head into "non-Pythonic error handling" territory.

The other problem in the above non-Pythonic code is that there's no default handler provided. Forcing every API user to make a decision they may not yet be equipped to make is just poor API design. But now we're back in general "good code"/"bad code" territory, so Pythonic/non-Pythonic really shouldn't be used to describe the difference.

Upvotes: 6

Andrea Zonca
Andrea Zonca

Reputation: 8773

Error handling should rely on exceptions and logging, so for each error raise an exception and log an error message.

Then at any caller function level catch the exception, log any other additional error if needed and handle the issue.

If the issue is not fully handled, then re-raise the exception again so that upper levels can catch the same exception and perform different actions.

In any of this stages you can keep a counter of some types of exceptions so that you can perform some actions only if there have been a specific number of issues.

Upvotes: 0

Related Questions