Xun Lee
Xun Lee

Reputation: 312

Luigi framework crash

when I running luigi tasks, sometimes will meet framework crash, cause the following tasks all failed. Here the error log info:

2017-10-05 22:02:02,564 luigi-interface WARNING  Failed pinging scheduler
2017-10-05 22:02:03,129 requests.packages.urllib3.connectionpool INFO     Starting new HTTP connection (126): localhost
2017-10-05 22:02:03,130 luigi-interface ERROR    Failed connecting to remote scheduler 'http://localhost:8082'
Traceback (most recent call last):
    ...
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 585, in send
    r = adapter.send(request, **kwargs)
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 467, in send
    raise ConnectionError(e, request=request)
    ConnectionError: HTTPConnectionPool(host='localhost', port=8082): Max retries exceeded with url: /api/add_worker (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f15128cb3d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
2017-10-05 22:02:03,180 luigi-interface INFO     Worker Worker(salt=150908931, workers=3, host=etl2, username=develop, pid=18019) was stopped. Shutting down Keep-Alive thread
Traceback (most recent call last):
    File "app_metadata.py", line 1567, in <module>
    luigi.run()
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 210, in run
    return _run(*args, **kwargs)['success']
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 238, in _run
    return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/interface.py", line 197, in _schedule_and_run
    success &= worker.run()
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 867, in run
    self._add_worker()
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 652, in _add_worker
    self._scheduler.add_worker(self._id, self._worker_info)
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 219, in add_worker
    return self._request('/api/add_worker', {'worker': worker, 'info': info})
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 146, in _request
    page = self._fetch(url, body, log_exceptions, attempts)
    File "/home/develop/data_warehouse/venv/local/lib/python2.7/site-packages/luigi/rpc.py", line 138, in _fetch
    last_exception
    luigi.rpc.RPCError: Errors (3 attempts) when connecting to remote scheduler 'http://localhost:8082'

sounds like try to ping central schedule, but be failed, then crashed, later tasks all be blocked, cannot run successfully.

and, some one else also meet the similar error, but his resolution not works. Github - Failed connecting to remote scheduler #1894

Upvotes: 2

Views: 3633

Answers (2)

cangers
cangers

Reputation: 408

Have you configured the central scheduler properly? See the docs: https://luigi.readthedocs.io/en/stable/central_scheduler.html

If not, try using the local scheduler by specifying --local-scheduler from the command line.

Upvotes: 0

MattMcKnight
MattMcKnight

Reputation: 8290

I would try making the timeout a little longer if your central scheduler is getting overloaded. You could also increase retries and retry wait time.

in luigi.cfg

[core]
rpc-connect-timeout=60.0 #default is 10.0
rpc-retry-attempts=10    #default is 3
rpc-retry-wait=60        #default is 30

You may also want to add a watch have the scheduler process automatically restart on crash.

Upvotes: 2

Related Questions