Reputation: 1863
I'm trying to get to the bottom of a problem with Django and database connection errors. At this point I'm after debugging tips as I think the symptoms are too non-specific.
Some background - I've been using this stack, deployed in AWS for many years without issue:
An AWS load balancer sends traffic to the Ubuntu instance, which is handled by Nginx, which forwards on to Django (3.2.6) running in Uwsgi. Django connects to the database using psycopg2 (2.9.1). Normally this setup works perfectly for me.
The issue I have it that the database connection seems to be closing randomly. Django reports errors like this:
Traceback (most recent call last):
[my code...]
for answer in q.select_related('entry__session__player'):
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 280, in __iter__
self._fetch_all()
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1324, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 51, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1173, in execute_sql
cursor = self.connection.cursor()
File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 259, in cursor
return self._cursor()
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/dist-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/dist-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed
The location in my code varies. Sometimes (less frequently) I get this too:
Traceback (most recent call last):
[my code...]
group = contest.groups.create(restaurant = restaurant, supergroup = supergroup)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/fields/related_descriptors.py", line 677, in create
return super(RelatedManager, self.db_manager(db)).create(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 453, in create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 726, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 763, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 868, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 906, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.8/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1270, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 78, in _execute
self.db.validate_no_broken_transaction()
File "/usr/local/lib/python3.8/dist-packages/django/db/backends/base/base.py", line 447, in validate_no_broken_transaction
raise TransactionManagementError(
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
Again, the location in my code varies, and it's not always in a simple create call - sometimes it's bulk_create, sometimes get_or_create. I'm guessing that the underlying cause is probably the same as the 'connection_closed' error but I'm not sure.
So that's what Django tells me. The Postgresql log doesn't contain any errors that coincide in time with the errors reported by Django. The only errors in the log are of this form:
LOG: could not receive data from client: Connection reset by peer
These coincide with uwsgi killing off worker processes (I have a 1000 request limit set for each worker to avoid any potential memory leak issues), so they aren't related.
So Postgresql is reporting no relevant errors - I can only assume that the connection was closed properly, and Django wasn't expecting that. There are no errors in the systemd journal at all on the Ubuntu instance.
I'm unsure how to proceed. I doubt this is a bug in Django but no other component in the system is complaining and it must be a low-level issue. This happens quite rarely, but enough to be a concern - something like 1 in 1000 requests.
Any insights or suggestions of how to investigate this further would be gratefully accepted :)
Upvotes: 3
Views: 1091
Reputation: 1863
I've figured this out. It's pretty wild, but makes sense.
Fundamentally, the HTTP client is disconnecting early, while the request is still being processed.
uWSGI must be handling the disconnect in a separate thread than the request handler - this is fine because it acquires the Python GIL before calling into Python.
A totally unexpected (to me anyways!) consequence of this is that the close call is simply injected into the request handler's call stack - the request handler code is running, then suddenly execution switches to the HttpResponse close code. I observed this in stack traces during debugging - see this question for the details.
As part of Django's end-of-request handling, the DB connection may be closed if Django thinks it's broken - this happens because an unexpected transaction is in effect - one started by my own code above the injected HttpResponse.close()
call.
When the HttpReponse.close()
call finishes running and the interpreter returns to my code, the transaction error is raised because the DB connection has been closed.
So really I think this is all as it should be, more or less. It would be nice to have a more specific 'premature close' error rather than the more generic "connection already closed" error, but I'm not sure how you would neatly construct such a thing.
Upvotes: 2