Screamer
Screamer

Reputation: 1171

How to diagnose extra SQLAlchemy connections in Pyramid

When my app runs, I'm very frequently getting issues around the connection pooling (one is "QueuePool limit of size 5 overflow 10 reached", another is "FATAL: remaining connection slots are reserved for non-replication superuser connections").

I have a feeling that it's due to some code not closing connections properly, or other code greedily trying to open new ones when it shouldn't, but I'm using the default SQL Alchemy settings so I assume the pool connection defaults shouldn't be unreasonable. We are using the scoped_session(sessionmaker()) way of creating the session so multiple threads are supported.

So my main question is if there is a tool or way to find out where the connections are going? Short of being able to see as soon as a new one is created (that is not supposed to be created), are there any obvious anti-patterns that might result in this effect?

Pyramid is very un-opinionated and with DB connections, there seem to be two main approaches (equally supported by Pyramid it would seem). In our case, the code base when I started the job used one approach (I'll call it the "globals" approach) and we've agreed to switch to another approach that relies less on globals and more on Pythonic idioms.

About our architecture: the application comprises one repo which houses the Pyramid project and then sources a number of other git modules, each of which had their own connection setup. The "globals" way connects to the database in a very non-ORM fashion, eg.:

(in each repo's __init__ file)
def load_database:
    global tables

    tables['table_name'] = Table(
        'table_name', metadata,
        Column('column_name', String),
    )

There are related globals that are frequently peppered all over the code:

def function_needing_data(field_value):
    global db, tables
    select = sqlalchemy.sql.select(
        [tables['table_name'].c.data], tables['table_name'].c.name == field_value)
    return db.execute(select)

This tables variable is latched onto within each git repo which adds some more tables definitions and somehow the global tables manages to work, providing access to all of the tables.

The approach that we've moved to (although at this time, there are parts of both approaches still in the code) is via a centralised connection, binding all of the metadata to it and then querying the db in an ORM approach:

(model)
class ModelName(MetaDataBase):
    __tablename__ = "models_table_name"
    ... (field values)

(function requiring data)
from models.db import DBSession
from models.model_name import ModelName

def function_needing_data(field_value):
    return DBSession.query(ModelName).filter(
        ModelName.field_value == field_value).all()

We've largely moved the code over to the latter approach which feels right, but perhaps I'm mistaken in my intentions. I don't know if there is anything inherently good or bad in either approach but could this (one of the approaches) be part of the problem so we keep running out of connections? Is there a telltale sign that I should look out for?

Upvotes: 2

Views: 377

Answers (1)

Screamer
Screamer

Reputation: 1171

It appears that Pyramid functions best (in terms of handling the connection pool) when you use the Pyramid transaction manager (pyramid_tm). This excellent article by Jon Rosebaugh provides some helpful insight into both how Pyramid apps typically set up their database connections and how they should set them up.

In my case, it was necessary to include the pyramid_tm package and then remove a few occurrences where we were manually committing session changes since pyramid_tm will automatically commit changes if it doesn't see a reason not to.

[Update]

I continued to have connection pooling issues although much fewer of them. After a lot of debugging, I found that the pyramid transaction manager (if you're using it correctly) should not be the issue at all. The issue to the other connection pooling issues I had had to do with scripts that ran via cron jobs. A script will release it's connections when it's finished, but bad code design may result in situations where the same script can be opened up and starts running while the previous one is running (causing them both to run slower, slow enough to have both running while a third instance of the script starts and so on).

This is a more language- and database-agnostic error since it stems from poor job-scripting design but it's worth keeping in mind. In my case, the script had an "&" at the end so that each instance started as a background process, waited 10 seconds, then spawned another, rather than making sure the first job started AND completed, then waited 10 seconds, then started another.

Hope this helps when debugging this very frustrating and thorny issue.

Upvotes: 1

Related Questions