mariotomo
mariotomo

Reputation: 9728

Querying objects added to a non committed session in SQLAlchemy

So I placed this question without too much context, and got downvoted, let's try again...

For one, I don't follow the logic behind SQLAlchemy's session.add. I understand that it queues the object for insertion, and I understand that session.query looks in the connected database rather than in the session, but is it at all possible, within SQLAlchemy, to query the session without first doing session.flush? My expectation from something which reads session.query is that it queries the session...

I am now manually looking in session.new after a None comes out of session.query().first().


There's two reasons why I don't want to do session.flush before my session.query,

So really the core of this question is who's helping me find an error in a GPL program on github!

This is a code snippet with a surprising behaviour in bauble/ghini:

# setting up things in ghini
# <replace-later>
import bauble
import bauble.db as db
db.open('sqlite:///:memory:', verify=False)
from bauble.prefs import prefs
import bauble.pluginmgr as pluginmgr
prefs.init()
prefs.testing = True
pluginmgr.load()
db.create(True)
Session = bauble.db.Session
from bauble.plugins.garden import Location
# </replace-later>

# now just plain straightforward usage
session = Session()

session.query(Location).delete()
session.commit()
u0 = session.query(Location).filter_by(code=u'mario').first()
print u0

u1 = Location(code=u'mario')
session.add(u1)
session.flush()

u2 = session.query(Location).filter_by(code=u'mario').one()
print u1, u2, u1==u2

session.rollback()
u3 = session.query(Location).filter_by(code=u'mario').first()
print u3

the output here is:

None
mario mario True
mario

here you have what I think is just standard simple code to set up a database:

from sqlalchemy import Column, Unicode

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

class Location(Base):
    __tablename__ = 'location'
    code = Column(Unicode(64), index=True, primary_key=True)
    def __init__(self, code=None):
        self.code = code
    def __repr__(self):
        return self.code

from sqlalchemy import create_engine
engine = create_engine('sqlite:///joindemo.db')

Base.metadata.create_all(engine)

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine, autoflush=False)

with this, the output of the same above code snippet is less surprising:

None
mario mario True
None

Upvotes: 2

Views: 5604

Answers (1)

Ilja Everil&#228;
Ilja Everil&#228;

Reputation: 52929

The reason why flushes in bauble end up emitting a COMMIT is the line 133 in db.py where they handle their history table:

table.insert(dict(table_name=mapper.local_table.name,
                  table_id=instance.id, values=str(row),
                  operation=operation, user=user,
                  timestamp=datetime.datetime.today())).execute()

Instead of issuing the additional SQL in the event handler using the passed in transactional connection, as they should, they execute the statement itself as is, which means it ends up using the engine as the bind (found through the table's metadata). Executing using the engine has autocommit behaviour. Since bauble always uses a SingletonThreadPool, there's just one connection per thread, and so that statement ends up committing the flushed changes as well. I wonder if this bug is the reason why bauble disables autoflush...

The fix is to change the event handling to use the transactional connection:

class HistoryExtension(orm.MapperExtension):
    """
    HistoryExtension is a
    :class:`~sqlalchemy.orm.interfaces.MapperExtension` that is added
    to all clases that inherit from bauble.db.Base so that all
    inserts, updates, and deletes made to the mapped objects are
    recorded in the `history` table.
    """
    def _add(self, operation, mapper, connection, instance):
        """
        Add a new entry to the history table.
        """
        ...  # a ton of code here
        table = History.__table__
        stmt = table.insert(dict(table_name=mapper.local_table.name,
                                 table_id=instance.id, values=str(row),
                                 operation=operation, user=user,
                                 timestamp=datetime.datetime.today()))
        connection.execute(stmt)

    def after_update(self, mapper, connection, instance):
        self._add('update', mapper, connection, instance)

    def after_insert(self, mapper, connection, instance):
        self._add('insert', mapper, connection, instance)

    def after_delete(self, mapper, connection, instance):
        self._add('delete', mapper, connection, instance)

It's worth a note that MapperExtension has been deprecated since version 0.7.


Regarding your views about the session I quote "Session Basics", which you really should read through:

In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, ...

and "Is the Session a cache?":

Yeee…no. It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.

So:

My expectation from something which reads session.query is that it queries the session...

Your expectations are wrong. The Session handles talking to the DB – among other things.

There's two reasons why I don't want to do session.flush before my session.query,

  • one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);

Because your DB may do validation, have triggers, and generate values for some columns – primary keys, timestamps, and the like. The data you thought you're inserting may end up something else in the DB and the Session has absolutely no way to know about that.

Also, why should SQLAlchemy implement a sort of an in-memory DB in itself, with its own query engine, and all the problems that come with synchronizing 2 databases? How would SQLAlchemy support all the different operations and functions of different DBs you query against? Your simple equality predicate example just scratches the surface.

When you rollback, you roll back the DB's transaction (along with the session's unflushed changes).

  • two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.

Caused by the event handling bug.

Upvotes: 3

Related Questions