Reputation: 1
I'm writing a script in python where I will need to access a postgresql database multiple times and execute multiple select queries and insert queries. I am trying to reduce the time it takes for this script to run.
Currently I have written a secondary function which I pass a qry string, a boolean indicating if I am inserting or recieving data, and a list of parameters and then execute the query:
def sql_call(qry, insert, inputlist):
params = config_np()
with psycopg2.connect(**params) as conn:
cur = conn.cursor()
try:
cur.execute(qry, inputlist)
if insert:
conn.commit()
sqlrtn = True
else:
sqlrtn = cur.fetchall()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
quit()
conn.close()
return sqlrtn
I'm working with a few hundred thousand entries and this takes forever to run. Is there a faster way to do it?
Upvotes: 0
Views: 3604
Reputation: 3807
A few things you can do. First, don't reestablish the connection with each query. This can be used over multiple queries so you will not need to recreate it with each query. If you still want to have the flexibility of having a function to execute the query, create a class where the __init__
method opens the connection and keeps it in self.conn, and a __del__
method that closes the connection.
For INSERT operations, you can insert multiple rows if you're using VALUES
with,
INSERT INTO table (fld1, fld2, fld3)
(VALUES ('some', 'data', 'here'),
('more', 'data', 'here'));
psycopg2 will take quite a lot rows in a query like this. For the SELECT
queries, it would depend on what your program is doing. Depending on requirements, you could cache quite of lot a data in memory given the amount available on computers these days.
Upvotes: 1