Richard
Richard

Reputation: 1863

Random database disconnects with Django and Postgresql in AWS

I'm trying to get to the bottom of a problem with Django and database connection errors. At this point I'm after debugging tips as I think the symptoms are too non-specific.

Some background - I've been using this stack, deployed in AWS for many years without issue:

An AWS load balancer sends traffic to the Ubuntu instance, which is handled by Nginx, which forwards on to Django (3.2.6) running in Uwsgi. Django connects to the database using psycopg2 (2.9.1). Normally this setup works perfectly for me.

The issue I have it that the database connection seems to be closing randomly. Django reports errors like this:

Traceback (most recent call last):
  [my code...]
    for answer in q.select_related('entry__session__player'):
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
    cursor = self.connection.cursor()
  File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 259, in cursor
    return self._cursor()
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/usr/local/lib/python3.8/dist-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

The location in my code varies. Sometimes (less frequently) I get this too:

Traceback (most recent call last):
  [my code...]
    group = contest.groups.create(restaurant = restaurant, supergroup = supergroup)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/fields/related_descriptors.py", line 677, in create
    return super(RelatedManager, self.db_manager(db)).create(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 453, in create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 78, in _execute
    self.db.validate_no_broken_transaction()
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 447, in validate_no_broken_transaction
    raise TransactionManagementError(
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

Again, the location in my code varies, and it's not always in a simple create call - sometimes it's bulk_create, sometimes get_or_create. I'm guessing that the underlying cause is probably the same as the 'connection_closed' error but I'm not sure.

So that's what Django tells me. The Postgresql log doesn't contain any errors that coincide in time with the errors reported by Django. The only errors in the log are of this form:

LOG: could not receive data from client: Connection reset by peer

These coincide with uwsgi killing off worker processes (I have a 1000 request limit set for each worker to avoid any potential memory leak issues), so they aren't related.

So Postgresql is reporting no relevant errors - I can only assume that the connection was closed properly, and Django wasn't expecting that. There are no errors in the systemd journal at all on the Ubuntu instance.

I'm unsure how to proceed. I doubt this is a bug in Django but no other component in the system is complaining and it must be a low-level issue. This happens quite rarely, but enough to be a concern - something like 1 in 1000 requests.

Any insights or suggestions of how to investigate this further would be gratefully accepted :)

Upvotes: 3

Views: 1091

Answers (1)

Richard
Richard

Reputation: 1863

I've figured this out. It's pretty wild, but makes sense.

Fundamentally, the HTTP client is disconnecting early, while the request is still being processed.

uWSGI must be handling the disconnect in a separate thread than the request handler - this is fine because it acquires the Python GIL before calling into Python.

A totally unexpected (to me anyways!) consequence of this is that the close call is simply injected into the request handler's call stack - the request handler code is running, then suddenly execution switches to the HttpResponse close code. I observed this in stack traces during debugging - see this question for the details.

As part of Django's end-of-request handling, the DB connection may be closed if Django thinks it's broken - this happens because an unexpected transaction is in effect - one started by my own code above the injected HttpResponse.close() call.

When the HttpReponse.close() call finishes running and the interpreter returns to my code, the transaction error is raised because the DB connection has been closed.

So really I think this is all as it should be, more or less. It would be nice to have a more specific 'premature close' error rather than the more generic "connection already closed" error, but I'm not sure how you would neatly construct such a thing.

Upvotes: 2

Related Questions