Reputation: 1477
I have the following Postgres query where I am fetching data from table1 with rows ~25 million and would like to write the output of the below query into multiple files.
query = """ WITH sequence AS (
SELECT
a,
b,
c
FROM table1 )
select * from sequence;"""
Below is the python script to fetch the complete dataset. How can I modify the script to fetch it to multiple files (eg. each file has 10000 rows)
#IMPORT LIBRARIES ########################
import psycopg2
from pandas import DataFrame
#CREATE DATABASE CONNECTION ########################
connect_str = "dbname='x' user='x' host='x' " "password='x' port = x"
conn = psycopg2.connect(connect_str)
cur = conn.cursor()
conn.autocommit = True
cur.execute(query)
df = DataFrame(cur.fetchall())
Thanks
Upvotes: 1
Views: 11052
Reputation:
Here are 3 methods that may help
snippet
with conn.cursor(name='fetch_large_result') as cursor:
cursor.itersize = 20000
query = "SELECT * FROM ..."
cursor.execute(query)
for row in cursor:
....
snippet
conn = psycopg2.connect(conn_url)
cursor = conn.cursor(name='fetch_large_result')
cursor.execute('SELECT * FROM <large_table>')
while True:
# consume result over a series of iterations
# with each iteration fetching 2000 records
records = cursor.fetchmany(size=2000)
if not records:
break
for r in records:
....
cursor.close() # cleanup
conn.close()
Finally you could define the a SCROLL CURSOR
snippet
BEGIN MY_WORK;
-- Set up a cursor:
DECLARE scroll_cursor_bd SCROLL CURSOR FOR SELECT * FROM My_Table;
-- Fetch the first 5 rows in the cursor scroll_cursor_bd:
FETCH FORWARD 5 FROM scroll_cursor_bd;
CLOSE scroll_cursor_bd;
COMMIT MY_WORK;
Please note Not naming the cursor in psycopg2 will cause the cursor to be client side as opposed to server side.
Upvotes: 10