Reputation: 4080
Got a script that was running very fast (about 20 seconds to process 30,000 records) when I am processing 100,000 records or thereabouts. The scripts grabs records from a postgresql database, processes them and then marks in the database that those records have been processed.
Problem is I've now pointed the scripts to a database with 50 million records and now 10,000 records are taking about 160 seconds! That is extremely slow.
Is there something I can do to speed up my updates?
my python and SQLAlchemy Core code are:
def process_records(no_of_records, data)
for x in range(no_of_records):
my_data = data[x]
'''process the data, when done, mark as is_processed'''
dict = {}
dict['b_id'] = pid
dict['is_processed'] = is_processed
is_processed_list.append(dict)
CONN = Engine.connect()
trans = CONN.begin()
stmt = mytable.update().where(mytable.c.id == bindparam('b_id')).\
values(is_processed=bindparam('is_processed'))
CONN.execute(stmt, is_processed_list)
trans.commit()
EDIT
I profiled my code and I realized that the problem was not with this python loop or even with the bulk insert. The problem where I was loosing 80+ seconds
was in selecting the data to process in the first place. Once I figured that out then @Martin's suggestion I included an index and its back to rocket speeds!
Upvotes: 0
Views: 3309
Reputation: 1639
You should check what the query planner tells you. Prefix your SQL query with "EXPLAIN" to get details about what Postgresql is trying to do. First, just try to SELECT rows that should be updated to see what happens.
If it's the selection part that is slow (reading), it might be fastened by using an index on this condition (mytable.c.id == b_id). Note that in Postgresql, a FOREIGN KEY will not index your table.
See here for details about EXPLAIN: http://www.postgresql.org/docs/current/static/sql-explain.html
If it's the writing part that is slow, you may need to tune the configuration of Postgresql.
Upvotes: 2