Reputation: 689
I wrote a script in python
that connects to a database using psycopg
.
I'm working with two tables in the same database. First I run a select * from table_1
and after that I check if each row (fetchone
) meets some requirements. If they do, a new row in table_2 is created with an INSERT
. The code looks similar to this:
cursor_1 = conn.cursor()
cursor_2 = conn.cursor()
cursor_1.execute("SELECT * FROM table_1")
for i in range(1,n):
a = cursor_1.fetchone()
if (condition_in_a):
cursor_2.execute("INSERT into table_2 (f1,f2) values (v1,v2)")
The script runs in a sequential for
loop and I wanted to know if there's a better way to accomplish this task through psycopg2.pool
in terms of performance. So my question is: Is it possible to multithread a psycopg
cursor? If not, what would be the best approach to achieve a better performance?
Thanks in advance.
Upvotes: 2
Views: 703
Reputation: 8039
In a single-thread approach, you are trying to keep the balance between memory usage and rows processing speed. fetchone
decrease memory usage but require more work to fetch the data. fetchall
for huge datasets use large memory amounts and data loading may take some time before processing started.
So, at least, a better way is to fetch data by smaller portions with fetchmany
. The following generator may simplify a process:
def ResultIter(cursor, limit=100):
while True:
results = cursor.fetchmany(limit)
if not results:
break
for result in results:
yield result
Based on this approach you can implement multithreading processing as well, but I'm really not sure about performance.
Upvotes: 1