Justtruggingalong
Justtruggingalong

Reputation: 91

What is the fastest/most efficient way to pull data from Postgresql to python with psycopg2

Question is what's written on the tin. I currently have a project I am working on with the current workflow:

  1. pull all the data from a workspace into python
  2. search every column for a list of keyterms
  3. return hits
  4. next database and repeat

I have been able to squeeze some extra efficiency into the loop by running concurrent.futures on the keyterm search (step 2 and 3), which allows me to run all cores simultaneously. Now I wanted to see if I could get some extra efficiency out of the loop by speeding up the database calls (step 1)

Here is my current code

import psycopg2


conn = pg.connect(
         host=host,
         database=database,
         username=username,
         password=password 
         )

SQLselect=  '''
            select *
            from {}
            '''
for databese in databases:
    cur=conn.cursor('database')
    call=cur.execute(SQLselect.format(database))
    rows=cur.fetchall
    cols=[desc[0] for desc in cur.description]
    temp = pd.DataFrame(rows, columns=cols

Instead of this method, I also tried using psycopg2's copy_to method. I figured this would be faster, since the copy_from method works so well. However, this actually ended up being slower than the code I have above. Is there any way I could speed this up or do this more efficiently?

Upvotes: 3

Views: 1958

Answers (1)

Greg
Greg

Reputation: 1991

If the goal is to execute SQL on all of those DB's as quickly as possible, and they are not order-dependent, then you could likely run them all parallel/concurrently.

I would take a look at the link here for some potential solutions.

Upvotes: 2

Related Questions