Reputation: 3269
I'm looking for advice on efficient ways to stream data incrementally from a Postgres table into Python. I'm in the process of implementing an online learning algorithm and I want to read batches of training examples from the database table into memory to be processed. Any thoughts on good ways to maximize throughput? Thanks for your suggestions.
Upvotes: 14
Views: 15805
Reputation: 637
You may want to look into the Postgres LISTEN/NOTIFY functionality https://www.postgresql.org/docs/9.1/static/sql-notify.html
Upvotes: 0
Reputation: 44202
If you are using psycopg2, then you will want to use a named cursor, otherwise it will try to read the entire query data into memory at once.
cursor = conn.cursor("some_unique_name")
cursor.execute("SELECT aid FROM pgbench_accounts")
for record in cursor:
something(record)
This will fetch the records from the server in batches of 2000 (default value of itersize
) and then parcel them out to the loop one at a time.
Upvotes: 23