Reputation: 435
I need to analyze data from sql server. The process is memory heavy which requires me to use my university's computing resources as opposed my personal computer. Due to university policy, I cannot access the sqlserver directly and the only workaround is to mysqldump everything into a .sql file, and then copy it to the university's supercomputer.
The problem I have now is opening the .sql file in pandas. I referred to Read External SQL File into Pandas Dataframe
and got this code:
# Read the sql file and execute the query
with open('/filepath/mydata.sql', 'r') as query:
connection = 'database'
sql = pd.read_sql_query(query.read(),connection, encoding='unicode_escape')
sql
I am getting this error, and I am not sure how to fix it.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 6848294369: invalid start byte
I included encoding='unicode_escape'
which seems to fix the error in read_csv, but I have not been successful when using read_sql_query.
Also, when I do get read_sql_query, how would I download data from a certain table? I use a sqlalchemy connection and this code:
pd.read_sql(f"SELECT * FROM table", con=db_connection, chunksize = 100)
or pd.read_sql_table(f"table", con=db_connection, chunksize = 100)
to achieve that, but how would I do that using a .sql file?
Thank you for the help
Upvotes: 0
Views: 75