Python parallel, semaphore leak warning and abort without traceback

I am running a parallelized grid search in python using joblib.Parallel. My script is relatively straighforward:

# Imports for data and classes...

    Parallel(n_jobs=n_jobs)(
        delayed(biz_model)(
            ...
        )
        for ml_model_params in grid
        for past_horizon in past_horizons
    )

When I run it in my local machine, it seems to run fine though I can only test it on small datasets for memory reasons. Yet when I try to run it on a remote Oracle Linux server It begins some runs and after a while it outputs:

/u01/.../resources/python/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))

Aborted!

I tried to reproduce it locally, and with small experiments it does run. The unparallelized script also runs and the number of jobs (either low on high) doesn't prevent the bug from happening.

So my question is, given that there is no traceback, is there a way to make joblib or Parallel more verbose? I just cannot quite get an idea of where to look at possible fail reasons without a traceback. Obviously if some possible reason for the abort can be inferred from just this (and I fail to grasp it) I thank the notice very much.

Thanks in advance.

Upvotes: 3

Views: 2289

Answers (1)

Using a logger, catching the exception, logging it, flushing the logs and raising it again, usually makes the trick

# Imports for data and classes...
# Creates logger

    Parallel(n_jobs=n_jobs)(
        try:
            delayed(biz_model)(
                ...
            )
            for ml_model_params in grid
            for past_horizon in past_horizons
        except BaseException as e:
            logger.exception(e)
            # you can use a for here if you had more than a handler
            logger.handlers[0].flush() 
            raise e
    )

Upvotes: 2

Related Questions