Ron
Ron

Reputation: 13

extracting a large Postgres table and writing it to a csv file using Python Pandas Data frame

I'm reading postgres table, extracting data and loading it into a csv file. The issue I have is that I'm able to read up to 5gb TABLE and successfully create a csv file. One of my tables is 35 GB and am unable to create a csv file, and the process is getting killed.

I suspect my dataframe is not able to handle large size.

What can we do to overcome this and create csv files successfully?

def table_to_csv(sql, file_path, dbname,port, user):

    """This function creates a csv file from PostgreSQL with query
    """
    try:
        conn = psycopg2.connect(dbname=dbname,  port=port, user=user)
        print("Connecting to Database")
        # Get data into pandas dataframe
        df = pd.read_sql(sql, conn)
        # Write to csv file
        df.to_csv(file_path, encoding='utf-8', header = True,doublequote = True, sep=',', index=False)
        print("CSV File has been created")
        conn.close()

    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

Upvotes: 1

Views: 3263

Answers (2)

Ron
Ron

Reputation: 13

This worked with Cursor and its copy expert function. Here is the code snippet

  cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    #execute
    with open("/sample.csv", 'w') as fd:

        cur.copy_expert("COPY sample TO STDOUT WITH (FORMAT CSV,  HEADER TRUE, 
        FORCE_QUOTE *)", fd)

Upvotes: 0

Chris
Chris

Reputation: 136936

Since your database is running on the local machine your most efficient option will probably be to use PostgreSQL's COPY command, e.g. something like

COPY table_name TO file_path WITH (FORMAT csv, ENCODING UTF8, HEADER);

PostgreSQL will save the data directly to the file itself, without having to read it all into memory at once or have your Python code touch it at all.

You should be able to run this via psycopg2's standard cursor.execute function. Of course, you could also run it via psql or another PostgreSQL client of your choice.

Upvotes: 1

Related Questions