dev_prodigy
dev_prodigy

Reputation: 141

SQLAlchemy - work with connections to DB and Sessions (not clear behavior and part in documentation)

I use SQLAlchemy (really good ORM but documentation is not clear enough) for communicating with PostgreSQL
Everything was great till one case when postgres "crashed" cause of maximum connection limits was reached: no more connections allowed (max_client_conn).
That case makes me think that I do smth wrong. After few experiments I figure out how not to face that issue again, but some questions left

Below you'll see code examples (in Python 3+, PostgreSQL settings are default) without and with mentioned issue, and what I'd like to hear eventually is answers on following questions:

  1. What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
  2. Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
  3. Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
  4. What is the best practices to open and close connections(and/or work with Session) when you use SQLAlchemy and PostgreSQL DB for multiple instances of script (or separate threads in script) that listens requests and has to have separate session to each of them? (I mean raw SQLAlchemy not Flask-SQLAlchemy or smth like this)

    Working example of code without issue:

making connection to DB:

from sqlalchemy.pool import NullPool  # does not work without NullPool, why?

def connect(user, password, db, host='localhost', port=5432):
    """Returns a connection and a metadata object"""
    url = 'postgresql://{}:{}@{}:{}/{}'.format(user, password, host, port, db)

    temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
    temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)

    return temp_con, temp_meta

function to get session to work with DB:

from contextlib import contextmanager

@contextmanager
def session_scope():
    con_loc, meta_loc = connect(db_user, db_pass, db_instance, 'localhost')
    Session = sessionmaker(bind=con_loc)

    """Provide a transactional scope around a series of operations."""
    session = Session()
    try:
        yield session
        session.commit()
    except:
        session.rollback()
        raise

query example:

with session_scope() as session:
    entity = session.query(SomeEntity).first()


Failing example of code:

function to get session to work with DB:

def create_session():
    # connect method the same as in first example
    con, meta = connect(db_user, db_pass, db_instance, 'localhost')
    Session = sessionmaker(bind=con)

    session = Session()
    return session

query example:

session = create_session()
entity = session.query(SomeEntity).first()


Hope you got the main idea

Upvotes: 12

Views: 15443

Answers (3)

Mike
Mike

Reputation: 2614

  1. What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?

The context manager in Python is used to create a runtime context for use with the with statement. Simply, when you run the code:

with session_scope() as session:
    entity = session.query(SomeEntity).first()

session is the yielded session. So, to your question of what the context manager does with the connections and sessions, all you have to do is look at what happens after the yield to see what happens. In this case it's just:

try:
    yield session
    session.commit()
except:
    session.rollback()
    raise

If you trigger no exceptions, it will be session.commit(), which according to the SQLAlchemy docs will "Flush pending changes and commit the current transaction."

  1. Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?

The poolclass argument is just telling SQLAlchemy which subclass of Pool to use. However, in the case where you pass NullPool here, you are telling SQLAlchemy to not use a pool. You're effectively disabling pooling connections when you pass in NullPool. From the docs: "to disable pooling, set poolclass to NullPool instead." I can't say for sure but using NullPool is probably contributing to your max_connection issues.

  1. Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")

I'm not exactly sure. I think this has to do with how in the first example, you are using a context manager so everything within the with block will use a session generator. In your second example, you created a function that initializes a new Session and returns it, so you're not getting back a generator. I also think this has to do with your NullPool use which prevents connection pooling. With NullPool each query execution is acquiring a connection on its own.

  1. What is the best practices to open and close connections(and/or work with Session) when you use SQLAlchemy and PostgreSQL DB for multiple instances of script (or separate threads in script) that listens requests and has to have separate session to each of them? (I mean raw SQLAlchemy not Flask-SQLAlchemy or smth like this)

See the section Is the session thread-safe? for this, but you need to take a "share nothing" approach to your concurrency. So in your case, you need each instance of a script to share nothing between each other.

You probably want to check out Working with Engines and Connections. I don't think messing with sessions is where you want to be if concurrency is what you're working on. There's more information about the NullPool and concurrency there:

For a multiple-process application that uses the os.fork system call, or for example the Python multiprocessing module, it’s usually required that a separate Engine be used for each child process. This is because the Engine maintains a reference to a connection pool that ultimately references DBAPI connections - these tend to not be portable across process boundaries. An Engine that is configured not to use pooling (which is achieved via the usage of NullPool) does not have this requirement.

Upvotes: 1

dev_prodigy
dev_prodigy

Reputation: 141

@Ilja Everilä answer was mostly helpful
I'll leave edited code here, maybe it'll help someone

New code that works like I expected is following:

making connection to DB::

from sqlalchemy.pool import NullPool  # will work even without NullPool in code

def connect(user, password, db, host='localhost', port=5432):
   """Returns a connection and a metadata object"""
   url = 'postgresql://{}:{}@{}:{}/{}'.format(user, password, host, port, db)

   temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
   temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)

   return temp_con, temp_meta 


one instance of connection and sessionmaker per app, for example where your main function:

from sqlalchemy.orm import sessionmaker

# create one connection and Sessionmaker to each instance of app (to avoid creating it repeatedly)
con, meta = connect(db_user, db_pass, db_instance, db_host)
session_maker = sessionmaker(bind=con) enter code here


function to get session with with statement:

from contextlib import contextmanager
from some_place import session_maker

@contextmanager
def session_scope() -> Session:
    """Provide a transactional scope around a series of operations."""
    session = session_maker()  # create session from SQLAlchemy sessionmaker
    try:
        yield session
        session.commit()
    except:
        session.rollback()
        raise


wrap transaction and use session:

with session_scope() as session:
    entity = session.query(SomeEntity).first()

Upvotes: 1

Ilja Everilä
Ilja Everilä

Reputation: 52939

First of all you should not create engines repeatedly in your connect() function. The usual practice is to have a single global Engine instance per database URL in your application. The same goes for the Session class created by the sessionmaker().

  1. What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?

What you've programmed it to do, and if this seems unclear, read about context managers in general. In this case it commits or rolls back the session if an exception was raised within the block governed by the with-statement. Both actions return the connection used by the session to the pool, which in your case is a NullPool, so the connection is simply closed.

  1. Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?

and

from sqlalchemy.pool import NullPool  # does not work without NullPool, why?

Without NullPool the engines you repeatedly create also pool connections, so if they for some reason do not go out of scope, or their refcounts are otherwise not zeroed, they will hold on to the connections even if the sessions return them. It is unclear if the sessions go out of scope timely in the second example, so they might also be holding on to the connections.

  1. Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")

The first example ends up closing the connection due to the use of the context manager that handles transactions properly and the NullPool, so the connection is returned to the bouncer, which is another pool layer.

The second example might never close the connections because it lacks the transaction handling, but that's unclear due to the example given. It also might be holding on to connections in the separate engines that you create.

The 4th point of your question set is pretty much covered by the official documentation in "Session Basics", especially "When do I construct a Session, when do I commit it, and when do I close it?" and "Is the session thread-safe?".

There's one exception: multiple instances of the script. You should not share an engine between processes, so in order to pool connections between them you need an external pool such as the PgBouncer.

Upvotes: 15

Related Questions