Reputation: 13
I'm reading postgres table, extracting data and loading it into a csv file. The issue I have is that I'm able to read up to 5gb TABLE and successfully create a csv file. One of my tables is 35 GB and am unable to create a csv file, and the process is getting killed.
I suspect my dataframe is not able to handle large size.
What can we do to overcome this and create csv files successfully?
def table_to_csv(sql, file_path, dbname,port, user):
"""This function creates a csv file from PostgreSQL with query
"""
try:
conn = psycopg2.connect(dbname=dbname, port=port, user=user)
print("Connecting to Database")
# Get data into pandas dataframe
df = pd.read_sql(sql, conn)
# Write to csv file
df.to_csv(file_path, encoding='utf-8', header = True,doublequote = True, sep=',', index=False)
print("CSV File has been created")
conn.close()
except Exception as e:
print("Error: {}".format(str(e)))
sys.exit(1)
Upvotes: 1
Views: 3263
Reputation: 13
This worked with Cursor and its copy expert function. Here is the code snippet
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
#execute
with open("/sample.csv", 'w') as fd:
cur.copy_expert("COPY sample TO STDOUT WITH (FORMAT CSV, HEADER TRUE,
FORCE_QUOTE *)", fd)
Upvotes: 0
Reputation: 136936
Since your database is running on the local machine your most efficient option will probably be to use PostgreSQL's COPY
command, e.g. something like
COPY table_name TO file_path WITH (FORMAT csv, ENCODING UTF8, HEADER);
PostgreSQL will save the data directly to the file itself, without having to read it all into memory at once or have your Python code touch it at all.
You should be able to run this via psycopg2
's standard cursor.execute
function. Of course, you could also run it via psql
or another PostgreSQL client of your choice.
Upvotes: 1