thosphor
thosphor

Reputation: 2781

When does psycopg execute cursor commands

I'm doing a simple query on a database with psycopg2. Currently it's a test database with only a few entries but I'm worried the commands I'm executing will become too memory-intense when migrated to a real database which could be large in size.

import psycopg2

connection = psycopg2.connect('<database_url>')
cur = connection.cursor()

cur.execute("SELECT * FROM test;")
entries = cur.fetchmany(limit)

So my question is this. Does psycopg2 run the database query at the execute line (in which case a large database will consume a lot of client memory), or does it not run the query until the fetchmany line, (in which case I can control the python memory consumption).

If the former, presumably I need to change my SQL query to only return x rows.

Upvotes: 0

Views: 959

Answers (1)

bimsapi
bimsapi

Reputation: 5065

To elaborate on the comment from @Arihant, the statement runs when execute is called. That will trigger memory consumption on the database. Chances are you won't need to worry about that (yet) - modestly size databases can handle millions of rows efficiently; there a lot of optimizations and memory management strategies in use, although memory overhead can become a problem if the query itself gets complicated.

Client memory comes into play once you start fetching data. I use one of two strategies to keep memory usage under control:

Use fetchmany() instead of fetchall() (vs using the cursor as an iterable). limit/offset queries can get the same basic result, but you need to execute many such queries to page through a large data set, which incurs unnecessary cost on the server.

with conn.cursor() as stmt:
    stmt.execute('select * from big_table')
    rows = stmt.fetchmany(10000)
    while rows:
        for row in rows:
            #do something clever with the row
        rows = stmt.fetchmany(10000)

User a server-side cursor. This has slightly higher overhead on the server, but will keep memory completely flat on the client (unless you are trying to hold all the records.

with conn.cursor('my_cursor') as stmt:
    stmt.execute('select * from big_table')
    for row in stmt:
        #do something clever with the row

The key thing, regardless of approach, is what you do with the records coming back. If you are trying to hold them in memory, you will run out of memory. Process the data in chunks.

Upvotes: 1

Related Questions