codingEnthusiast
codingEnthusiast

Reputation: 51

Performance issue with psycopg2 named cursor in python

After some research done, I found out the nameless cursor in python using psycopg2 will try to load all the result set into my computer memory, this is a big issue for me as the Postgresql table that I am querying is about 1.4 TB in size. I found out that giving a name to my cursor will create a server-side cursor that will only load the number of rows I will ask it to, using 'fetchmany' but it has become significantly slower to perform a query. Is there a way for me to speed up the server side cursor? *I try and only load about 2,000 rows as that is about the size that I need the batches to be in.

Upvotes: 3

Views: 1643

Answers (1)

Seth
Seth

Reputation: 466

I'm not 100% positive if this will fix your performance issue, but you do not need to use cursor.fetchmany() with a server-side cursor. You can just iterate over a named cursor.

with psycopg2.connect(db_uri_string) as conn:
    cursor = conn.cursor(name='cursor_name')
    cursor.itersize = 20000

    query = "SELECT * FROM ..."
    cursor.execute(query)

    for row in cursor:
        # process row 

Also the lower your itersize, the more network calls are made and that may be slowing down your performance especially with such a large dataset and such a low itersize. If you need to work in batches of 2000 you can increase your itersize and do 2000 rows per fetchmany e.g. fetchmany(2000). fetchmany will not make a network call until it has iterated through all of the current rows fetched.

If itersize is 20000, and you call fetchmany(2000), it would take 10 fetchmany(2000) to make a network call.

Upvotes: 2

Related Questions