ACV
ACV

Reputation: 1975

Understanding SQLAlchemy performance when iteration over an expired query and updating

I was iterating over a query in SQLAlchemy and performing updates on the rows and I was experiencing unexpectedly poor performance, orders of magnitude slower that I anticipated. The database has about 12K rows. My original query and loop looked like this:

query_all = session.query(MasterImages).all()

for record_counter, record in enumerate(query_all):
    # Some stuff happens here, set_id and set_index are defined
    session.query(MasterImages).\
            filter(MasterImages.id == record.id).\
            update({'set_id':set_id, 'set_index':set_index})
    if record_counter % 100 == 0:
        session.commit()
        print 'Updated {:,} records'.format(record_counter)
session.commit()

The first iteration through the loop was very fast but then it would seemingly stop after the first commit. I tried a bunch of different approaches but was getting no where. Then I tried changing my query so it only selected the fields I needed to calculate the set_id and set_index values I use in my update, like this:

query_all = session.query(MasterImages.id, MasterImages.project_id, 
                          MasterImages.visit, MasterImages.orbit, 
                          MasterImages.drz_mode, MasterImages.cr_mode).all()

This produced the performance I was expecting, plowing through all the records in well under a minute. After thinking about it for a while I think my issue was that in my first query, after the commit, turned into a stale query because I had updated a field that was (unnecessarily) in the query I was iterating over. This, I think, forced Alchemy to regenerate the query after every commit. By removing the fields I was updating from the query I was iterating over I think I was able to use the same query and this resulted in my performance increase.

Am I correct?

Upvotes: 1

Views: 860

Answers (1)

univerio
univerio

Reputation: 20518

Turn off expire_on_commit.

With expire_on_commit on, SQLAlchemy marks all of the objects in query_all as expired (or "stale", as you put it) after the .commit(). What being expired means is that the next time you try to access an attribute on your object SQLAlchemy will issue a SELECT to refresh the object. Turning off expire_on_commit will prevent it from doing that. expire_on_commit is an option so that naive uses of the ORM are not broken, so if you know what you're doing you can safely turn it off.

Your fix works because when you specify columns (MasterImages.id, etc) instead of a mapped class (MasterImages) in your query(), SQLAlchemy returns to you a plain python tuple instead of an instance of the mapped class. The tuple does not offer ORM features, i.e. it does not expire and it will never re-fetch itself from the database.

Upvotes: 2

Related Questions