How to read 10 million rows from POSTGRES and write the data to a CSV in python?

Question

I am trying to run a query on a table which has about 10 million rows. Basically I am trying to run Select * from events and then I am writing it to a CSV file.

Here is the code:

    with create_server_connection() as connection:
      cursor = connection.cursor()
      cursor.itersize = 20000
      cwd = os.getcwd()
      query = open( sql_file_path, mode='r').read()
      print(query)
      cursor.execute(query)
      with open(file_name, 'w', newline='')as fp:
        a = csv.writer(fp)
        for row in cursor:
          a.writerow(row)

def create_server_connection():
    DB_CONNECTION_PARAMS = os.environ["DB_REPLICA_CONNECTION"]
    json_object = json.loads(DB_CONNECTION_PARAMS)
    try:
      conn = psycopg2.connect(
        database=json_object["PGDATABASE"], user=json_object["PGUSER"], password=json_object["PGPASSWORD"], host=json_object["PGHOST"], port=json_object["PGPORT"]
      )  
    except psycopg2.OperationalError as e:
      print('Unable to connect!
{0}').format(e)
      sys.exit(1)

    return conn

However, for some reason, this whole process is taking up a lot of memory. I am running this as an AWS-batch process and the process exits with this error OutOfMemoryError: Container killed due to memory usage

Is there a way to reduce memory usage?

How to read 10 million rows from POSTGRES and write the data to a CSV in python?

Answers (1)

Related Questions