Reputation: 137
I am using SQLAlchemy
for the first time to export around 6 million records to MySQL. Following is the error I receive:
OperationalError: (mysql.connector.errors.OperationalError) 2055: Lost connection to MySQL server at '127.0.0.1:3306', system error: 10053 An established connection was aborted by the software in your host machine
Code:
import pandas as pd
import sqlalchemy
df=pd.read_excel(r"C:\Users\mazin\1-601.xlsx")
database_username = 'root'
database_password = 'aUtO1115'
database_ip = '127.0.0.1'
database_name = 'patenting in psis'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
format(database_username, database_password,
database_ip, database_name), pool_recycle=1, pool_timeout=30).connect()
df.to_sql(con=database_connection, name='sample', if_exists='replace')
database_connection.close()
Note: I do not get the error if I export around 100 records. After referring to similar posts, I have added the pool_recycle
and pool_timeout
parameters but the error still persists.
Upvotes: 3
Views: 4408
Reputation: 1068
Problem is that you're trying to import 6 million rows as one chunk. And it is taking time. With your current config, pool_recycle
is set to 1 second, meaning connection will close after 1 second, and that for sure is not enough time to insert 6 mill rows. My suggestion is next:
database_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://{0}:{1}@{2}/{3}'.format(
database_username,
database_password,
database_ip, database_name
), pool_recycle=3600, pool_size=5).connect()
df.to_sql(
con=database_connection,
name='sample',
if_exists='replace',
chunksize=1000
)
This will set pool of 5 connections with recycle time of 1 hour. And second line will insert 1000 at a time (instead of all the rows at once). You can experiment with values to achieve best performance.
Upvotes: 5