What is the fastest/most efficient way to pull data from Postgresql to python with psycopg2

Question

Question is what's written on the tin. I currently have a project I am working on with the current workflow:

pull all the data from a workspace into python
search every column for a list of keyterms
return hits
next database and repeat

I have been able to squeeze some extra efficiency into the loop by running concurrent.futures on the keyterm search (step 2 and 3), which allows me to run all cores simultaneously. Now I wanted to see if I could get some extra efficiency out of the loop by speeding up the database calls (step 1)

Here is my current code

import psycopg2


conn = pg.connect(
         host=host,
         database=database,
         username=username,
         password=password 
         )

SQLselect=  '''
            select *
            from {}
            '''
for databese in databases:
    cur=conn.cursor('database')
    call=cur.execute(SQLselect.format(database))
    rows=cur.fetchall
    cols=[desc[0] for desc in cur.description]
    temp = pd.DataFrame(rows, columns=cols

Instead of this method, I also tried using psycopg2's copy_to method. I figured this would be faster, since the copy_from method works so well. However, this actually ended up being slower than the code I have above. Is there any way I could speed this up or do this more efficiently?

What is the fastest/most efficient way to pull data from Postgresql to python with psycopg2

Answers (1)

Related Questions