Reputation: 2512
I am trying to diagnose an issue where some of my celery worker processes appear to hang for several minutes. I have many tasks that make several IO calls (usually to third party APIs). In any given job, I might be making several thousand requests to various APIs. I've looked at the logs and they all have on thing in common: they hang after urllib3
makes a connection to a remote url.
At the end of my jobs (takes ~30 minutes), there are usually a few tasks that are hung.
Here is an example of the logs that I use to conclude urllib3
is the culprit:
Jul 08 04:46:26 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897]
Jul 08 04:46:28 app/worker.1: [DEBUG/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user
Jul 08 04:46:28 app/worker.1: [INFO/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com
And then that's it. There is nothing logged after the Starting new HTTPS connection
statement.
This is where I restarted the worker:
Jul 08 05:09:18 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897]
Jul 08 05:09:19 app/worker.1: [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user
Jul 08 05:09:19 app/worker.1: [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: "GET /v2/api_call.json?username=some_user HTTP/1.1" 403 170
Jul 08 05:09:19 app/worker.1: [INFO/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com
Jul 08 05:09:19 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.job: Task my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] succeeded in 2.265356543008238s: 32345
In this case, I received a 403
status code. So that could be a potential culprit. However, I've seen in the logs that this has happened with status codes of 200
as well.
FWIW, I also see Resetting dropped connection: api.twitter.com
quite often in the logs.
I've made sure to provide a timeout
of 10 seconds everywhere I make a request using the requests
library.
So it appears that the request is made but then just hangs. It's possible the remote servers are responding at a very slow pace and therefore the timeout never actually occurs, but I find this unlikely as my issue does not occur with just one specific domain.
I am using Rabbit 3.1.3 celery:3.1.11 (Cipater) kombu:3.0.16 py:3.4.0 billiard:3.3.0.17 py-amqp:1.4.5. I am using prefork.
I'm using requests==2.3.0.
So why does it appear that my tasks hang after urllib3
logs the Starting new HTTPS connection
statement?
EDIT: I've added a lot of logging and provide further context below
2014-07-16T02:43:53.140381+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.strategy: Received task: [973c1361-43c3-41a2-9bcd-6ef850c41fcc]
2014-07-16T02:43:56.951211+00:00 app[worker.1]: [1;34m[DEBUG/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] : twitter, account_uid: some_user[0m
2014-07-16T02:43:56.951876+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to get request
2014-07-16T02:43:56.952005+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to prep request
2014-07-16T02:43:56.952897+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to prep request
2014-07-16T02:43:56.954133+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get response not chunked
2014-07-16T02:43:56.954249+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get a conn from the pool
2014-07-16T02:43:56.954360+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _get_conn: about to get a conn from the pool
2014-07-16T02:43:56.954482+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _get_conn: about to get a conn from the pool
2014-07-16T02:43:56.954587+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: returning conn or self new conn
2014-07-16T02:43:56.951725+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to get request
2014-07-16T02:43:56.954692+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.fullcontact.com
2014-07-16T02:43:56.954817+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: create connection class
2014-07-16T02:43:56.954948+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: just about to create connection class
2014-07-16T02:43:56.955062+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: returning conn or self new conn
2014-07-16T02:43:56.955166+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get a conn from the pool
2014-07-16T02:43:56.955290+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: about to call request
2014-07-16T02:43:56.955396+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send req
2014-07-16T02:43:56.953410+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to send
2014-07-16T02:43:56.953554+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get conn
2014-07-16T02:43:56.953986+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get conn
2014-07-16T02:43:56.956014+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: _send_output
2014-07-16T02:43:56.955517+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: put req
2014-07-16T02:43:56.955715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: put req
2014-07-16T02:43:56.956146+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send
2014-07-16T02:43:56.956309+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified conection: connect
2014-07-16T02:43:56.959157+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: set socket
2014-07-16T02:43:56.959045+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: set socket
2014-07-16T02:43:56.959295+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve cert reqs
2014-07-16T02:43:56.959402+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: resolve cert reqs
2014-07-16T02:43:56.959508+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve ssl ver
2014-07-16T02:43:56.959613+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: resolve ssl ver
2014-07-16T02:43:56.959715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: ssl_wrap_socket
2014-07-16T02:43:56.959827+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: about to get ssl context
2014-07-16T02:43:56.960012+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: about to get ssl context
2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
# *******************************************
# *******************************************
# This is when I restarted the job. It processed the same exact task, same parameters, same https url, etc in under 1 second.
# *******************************************
# *******************************************
2014-07-16T03:00:21.885531+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.strategy: Received task: [973c1361-43c3-41a2-9bcd-6ef850c41fcc]
2014-07-16T03:00:22.009616+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to get request
2014-07-16T03:00:22.010313+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to prep request
2014-07-16T03:00:22.016669+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to send
2014-07-16T03:00:22.018419+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get conn
2014-07-16T03:00:22.051267+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: put req
2014-07-16T03:00:22.051744+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send
2014-07-16T03:00:22.051879+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified conection: connect
2014-07-16T03:00:22.072346+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: set socket
2014-07-16T03:00:22.103234+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: wrap socket with host name
2014-07-16T03:00:22.007638+00:00 app[worker.1]: [1;34m[DEBUG/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] : : twitter, account_uid: some_user[0m
2014-07-16T03:00:22.009969+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to get request
2014-07-16T03:00:22.012954+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to prep request
2014-07-16T03:00:22.017254+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get conn
2014-07-16T03:00:22.049101+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get response not chunked
2014-07-16T03:00:22.049241+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get a conn from the pool
2014-07-16T03:00:22.049361+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _get_conn: about to get a conn from the pool
2014-07-16T03:00:22.049528+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _get_conn: about to get a conn from the pool
2014-07-16T03:00:22.049639+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: returning conn or self new conn
2014-07-16T03:00:22.049756+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.fullcontact.com
2014-07-16T03:00:22.049918+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: create connection class
2014-07-16T03:00:22.050197+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: just about to create connection class
2014-07-16T03:00:22.050318+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: returning conn or self new conn
2014-07-16T03:00:22.050425+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get a conn from the pool
2014-07-16T03:00:22.050587+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: about to call request
2014-07-16T03:00:22.050777+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send req
2014-07-16T03:00:22.050946+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: put req
2014-07-16T03:00:22.051546+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: _send_output
2014-07-16T03:00:22.072581+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: set socket
2014-07-16T03:00:22.072721+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve cert reqs
2014-07-16T03:00:22.072911+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: resolve cert reqs
2014-07-16T03:00:22.073044+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve ssl ver
2014-07-16T03:00:22.073179+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: resolve ssl ver
2014-07-16T03:00:22.073311+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: ssl_wrap_socket
2014-07-16T03:00:22.073445+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: about to get ssl context
2014-07-16T03:00:22.073814+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: about to get ssl context
2014-07-16T03:00:22.073979+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
2014-07-16T03:00:22.092134+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: load verify locations
2014-07-16T03:00:22.138197+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: wrap socket with host name
2014-07-16T03:00:22.138207+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: ssl_wrap_socket
2014-07-16T03:00:22.138209+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: match hostname
2014-07-16T03:00:22.138211+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: match host name
2014-07-16T03:00:22.138212+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified conection: connect
2014-07-16T03:00:22.138214+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send: before sock.sendall
2014-07-16T03:00:22.138743+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send: before sock.sendall
2014-07-16T03:00:22.138748+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send
2014-07-16T03:00:22.138796+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: _send_output
2014-07-16T03:00:22.139094+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send req
2014-07-16T03:00:22.139207+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _make_request: about to call request
2014-07-16T03:00:22.139321+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: conn.get reponse with buffer
2014-07-16T03:00:22.139925+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: conn.get reponse with no buffer
2014-07-16T03:00:22.288112+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: completest: about to send
2014-07-16T03:00:22.286453+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _make_request: conn.get reponse with no buffer
2014-07-16T03:00:22.286596+00:00 app[worker.1]: [1;34m[DEBUG/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: "GET /v2/person.json?twitter=some_user&apiKey=5eeaece3982efd1d&style=dictionary HTTP/1.1" 403 170[0m
2014-07-16T03:00:22.286720+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get response from httplib
2014-07-16T03:00:22.287050+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get response from httplib
2014-07-16T03:00:22.287170+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get response not chunked
2014-07-16T03:00:22.287279+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to build response
2014-07-16T03:00:22.287738+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to build response
2014-07-16T03:00:24.342182+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.job: Task [973c1361-43c3-41a2-9bcd-6ef850c41fcc] succeeded in 2.4521394340554252s: 44508
The last logged entry is
2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
According to my log statement, the culprit is load_verify_locations.
This is the corresponding code in the requests
library:
if ca_certs:
try:
context.load_verify_locations(ca_certs)
# Py32 raises IOError
# Py33 raises FileNotFoundError
except Exception as e: # Reraise as SSLError
raise SSLError(e)
So it appears that context.load_verify_locations(ca_certs)
is causing this to hang. Why?
Upvotes: 11
Views: 13142
Reputation: 15120
Are you using the multiprocessing.dummy
module? i.e the thread pool?
There seems to be a bug in either the Pool
module from multiprocessing.dummy
or urllib3
, which seemingly causes random HTTP connections to hang after they are ESTABLISHED i.e after completion of the TCP SYN-ACK cycle.
A sign of this bug happening would be when you trace the TCP request/response with Wireshark and you see a successful TCP connection being established and then a TCP RST after a few minutes, and NO application protocol traffic in between.
You may want to try switching your multiprocessing Pool code to process based, i.e use multiprocessing instead of multiprocessing.dummy.
i.e instead of
from multiprocessing.dummy import Pool as ThreadPool
use
from multiprocessing import Pool as ProcessPool
The above worked for me and the bug went away.
Upvotes: 0