u.unver34
u.unver34

Reputation: 132

redis.exceptions.ConnectionError after approximately one day celery running

This is my full trace:

    Traceback (most recent call last):
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task
    uuid, retval, SUCCESS, request=task_request,
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result
    request=request, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result
    self.set(self.get_key_for_task(task_id), self.encode(meta))
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set
    return self.ensure(self._set, (key, value), **retry_policy)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure
    **retry_policy
  File "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time
    return fun(*args, **kwargs)
  File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set
    pipe.execute()
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute
    return execute(conn, stack, raise_on_error)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction
    connection.send_packed_command(all_cmds)
  File "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command
    self.connect()
  File "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 0 connecting to localhost:6379. Error.
[2016-09-21 10:47:18,814: WARNING/Worker-747] Data collector is not contactable. This can be because of a network issue or because of the data collector being restarted. In the event that contact cannot be made after a period of time then please report this problem to New Relic support for further investigation. The error raised was ConnectionError(ProtocolError('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable')),).

I really searched for ConnectionError but there was no matching problem with mine.

My platform is ubuntu 14.04. This is a part of my redis config. (I can share if you need the whole redis.conf file. By the way all parameters are closed on LIMITS section.)

# By default Redis listens for connections from all the network interfaces
# available on the server. It is possible to listen to just one or multiple
# interfaces using the "bind" configuration directive, followed by one or
# more IP addresses.
#
# Examples:
#
# bind 192.168.1.100 10.0.0.1
bind 127.0.0.1

# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /var/run/redis/redis.sock
# unixsocketperm 755

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0

# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
#    equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 60 seconds.
tcp-keepalive 60

This is my mini redis wrapper:

import redis

from django.conf import settings


REDIS_POOL = redis.ConnectionPool(host=settings.REDIS_HOST, port=settings.REDIS_PORT)


def get_redis_server():
    return redis.Redis(connection_pool=REDIS_POOL)

And this is how i use it:

from redis_wrapper import get_redis_server

# view and task are working in different, indipendent processes

def sample_view(request):
    rs = get_redis_server()
    # some get-set stuff with redis



@shared_task
def sample_celery_task():
    rs = get_redis_server()
    # some get-set stuff with redis

Package versions:

celery==3.1.18
django-celery==3.1.16
kombu==3.0.26
redis==2.10.3

So the problem is that; this connection error occurs after some time of starting celery workers. And after first seem of that error, all the tasks ends with this error until i restart all of my celery workers. (Interestingly, celery flower also fails during that problematic period)

I suspect of my redis connection pool usage method, or redis configuration or less probably network issues. Any ideas about the reason? What am i doing wrong?

(PS: I will add redis-cli info results when i will see this error today)

UPDATE:

I temporarily solved this problem by adding --maxtasksperchild parameter to my worker starter command. I set it to 200. Ofcourse it is not the proper way to solve this problem, it is just a symptomatic cure. It basically refreshes the worker instance periodically (closes old process and creates new one when old one reached 200 task) and refreshes my global redis pool and connections. So i think i should focus on global redis connection pool usage way and i'm still waiting for new ideas and comments.

Sorry for my bad English and thanks in advance.

Upvotes: 3

Views: 2731

Answers (1)

Anoop Reghuvaran
Anoop Reghuvaran

Reputation: 349

Have you enabled the rdb background save method in redis ??
if so check for the size of the dump.rdb file in /var/lib/redis.
Sometimes the file grows in size and fill the root directory and the redis instance cannot save to that file anymore.

You can stop the background save process by issuing
config set stop-writes-on-bgsave-error no
command on redis-cli

Upvotes: -1

Related Questions