Reputation: 450
I have SQL Server Database Table in Amazon RDS and I am running a python script on a 8 GB server in AWS EC2. The python code simply select all the data in a large table and tries to convert it into EC2. The EC2 instance quickly runs out of memory even though I am trying to extract the data yearly, however I would like all the data to be extracted into a csv (I don't necessarily need to use Pandas)
As of now the pandas dataframe code is very simple
query= 'select * from table_name'
df = pd.read_sql(query,cnxn)
df.to_csv(target_name, index=False)
The error I am seeing is
Traceback (most recent call last): df = pd.read_sql(query,cnxn)
MemoryError
Upvotes: 1
Views: 5144
Reputation: 148910
If you can use read_sql
with pandas, you certainly have a driver that allows you to directly query the database with a DB-API 2.0 interface, and then write with the csv module one record at a time:
con = ... # it depends on your currect driver
curs = con.cursor()
curs.execute('select * from table_name')
with open(target_name, 'w', newline='') as fd:
wr = csv.writer(fd)
wr.writerow([d[0] for d in curs.description]) # write the header line
while True: # loop on the cursor
row = curs.fetchone()
if not row: break # until the end of rows
wr.writerow(row) # and write the row
curs.close()
con.close()
Upvotes: 0
Reputation: 168986
You'll want to use your SQL database's native management tools instead of Python/Pandas here.
mysql ... --batch --execute='select * from table_name' > my-file.csv
psql
do something like\copy (select * from table_name) To './my-file.csv' With CSV
sqlcmd -S MyServer -d myDB -E -Q "select * from table_name" -o "my-file.csv" -h-1 -s"," -w 700
If you really do want to use Pandas though, you might be able to get away with the chunksize
parameter (adjust accordingly if you're running out of memory):
with open('my_csv.csv', 'w') as f:
for i, partial_df in enumerate(pd.read_sql(query, cnxn, chunksize=100000)):
print('Writing chunk %s' % i)
partial_df.to_csv(f, index=False, header=(i == 0))
Upvotes: 4