Reputation: 3716
I'm using Pandas' to_sql
function to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html
Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!
def write_to_db(engine, frame, table_name, chunk_size):
start_index = 0
end_index = chunk_size if chunk_size < len(frame) else len(frame)
frame = frame.where(pd.notnull(frame), None)
if_exists_param = 'replace'
while start_index != end_index:
print "Writing rows %s through %s" % (start_index, end_index)
frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
if_exists_param = 'append'
start_index = min(start_index + chunk_size, len(frame))
end_index = min(end_index + chunk_size, len(frame))
engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)
Upvotes: 24
Views: 25005
Reputation: 139172
Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062
So starting from 0.15, you can specify the chunksize
argument and e.g. simply do:
df.to_sql('table', engine, chunksize=20000)
Upvotes: 31
Reputation: 1096
There is beautiful idiomatic function chunks provided in answer to this question
In your case you can use this function like this:
def chunks(l, n):
""" Yield successive n-sized chunks from l.
"""
for i in xrange(0, len(l), n):
yield l.iloc[i:i+n]
def write_to_db(engine, frame, table_name, chunk_size):
for idx, chunk in enumerate(chunks(frame, chunk_size)):
if idx == 0:
if_exists_param = 'replace':
else:
if_exists_param = 'append'
chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)
Only drawback that it doesn't support slicing second index in iloc function.
Upvotes: 1