Reputation: 1975
I was iterating over a query in SQLAlchemy and performing updates on the rows and I was experiencing unexpectedly poor performance, orders of magnitude slower that I anticipated. The database has about 12K rows. My original query and loop looked like this:
query_all = session.query(MasterImages).all()
for record_counter, record in enumerate(query_all):
# Some stuff happens here, set_id and set_index are defined
session.query(MasterImages).\
filter(MasterImages.id == record.id).\
update({'set_id':set_id, 'set_index':set_index})
if record_counter % 100 == 0:
session.commit()
print 'Updated {:,} records'.format(record_counter)
session.commit()
The first iteration through the loop was very fast but then it would seemingly stop after the first commit. I tried a bunch of different approaches but was getting no where. Then I tried changing my query so it only selected the fields I needed to calculate the set_id
and set_index
values I use in my update, like this:
query_all = session.query(MasterImages.id, MasterImages.project_id,
MasterImages.visit, MasterImages.orbit,
MasterImages.drz_mode, MasterImages.cr_mode).all()
This produced the performance I was expecting, plowing through all the records in well under a minute. After thinking about it for a while I think my issue was that in my first query, after the commit, turned into a stale query because I had updated a field that was (unnecessarily) in the query I was iterating over. This, I think, forced Alchemy to regenerate the query after every commit. By removing the fields I was updating from the query I was iterating over I think I was able to use the same query and this resulted in my performance increase.
Am I correct?
Upvotes: 1
Views: 860
Reputation: 20518
Turn off expire_on_commit
.
With expire_on_commit
on, SQLAlchemy marks all of the objects in query_all as expired (or "stale", as you put it) after the .commit(). What being expired means is that the next time you try to access an attribute on your object SQLAlchemy will issue a SELECT to refresh the object. Turning off expire_on_commit will prevent it from doing that. expire_on_commit is an option so that naive uses of the ORM are not broken, so if you know what you're doing you can safely turn it off.
Your fix works because when you specify columns (MasterImages.id, etc) instead of a mapped class (MasterImages) in your query(), SQLAlchemy returns to you a plain python tuple instead of an instance of the mapped class. The tuple does not offer ORM features, i.e. it does not expire and it will never re-fetch itself from the database.
Upvotes: 2