Reputation: 141
I am trying to run a query on a table which has about 10 million rows. Basically I am trying to run Select * from events
and then I am writing it to a CSV file.
Here is the code:
with create_server_connection() as connection:
cursor = connection.cursor()
cursor.itersize = 20000
cwd = os.getcwd()
query = open( sql_file_path, mode='r').read()
print(query)
cursor.execute(query)
with open(file_name, 'w', newline='')as fp:
a = csv.writer(fp)
for row in cursor:
a.writerow(row)
def create_server_connection():
DB_CONNECTION_PARAMS = os.environ["DB_REPLICA_CONNECTION"]
json_object = json.loads(DB_CONNECTION_PARAMS)
try:
conn = psycopg2.connect(
database=json_object["PGDATABASE"], user=json_object["PGUSER"], password=json_object["PGPASSWORD"], host=json_object["PGHOST"], port=json_object["PGPORT"]
)
except psycopg2.OperationalError as e:
print('Unable to connect!\n{0}').format(e)
sys.exit(1)
return conn
However, for some reason, this whole process is taking up a lot of memory. I am running this as an AWS-batch process and the process exits with this error OutOfMemoryError: Container killed due to memory usage
Is there a way to reduce memory usage?
Upvotes: 0
Views: 1266
Reputation: 6776
From the psycopg2 docs:
When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client.
If the dataset is too large to be practically handled on the client side, it is possible to create a server side cursor. Using this kind of cursor it is possible to transfer to the client only a controlled amount of data, so that a large dataset can be examined without keeping it entirely in memory.
Upvotes: 1