UnicodeDecodeError when opening a .sql file using pandas read_sql_query

I need to analyze data from sql server. The process is memory heavy which requires me to use my university's computing resources as opposed my personal computer. Due to university policy, I cannot access the sqlserver directly and the only workaround is to mysqldump everything into a .sql file, and then copy it to the university's supercomputer.

The problem I have now is opening the .sql file in pandas. I referred to Read External SQL File into Pandas Dataframe

and got this code:

# Read the sql file and execute the query
with open('/filepath/mydata.sql', 'r') as query:
    connection = 'database'
    sql = pd.read_sql_query(query.read(),connection, encoding='unicode_escape')
sql

I am getting this error, and I am not sure how to fix it.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 6848294369: invalid start byte

I included encoding='unicode_escape' which seems to fix the error in read_csv, but I have not been successful when using read_sql_query.

Also, when I do get read_sql_query, how would I download data from a certain table? I use a sqlalchemy connection and this code: pd.read_sql(f"SELECT * FROM table", con=db_connection, chunksize = 100) or pd.read_sql_table(f"table", con=db_connection, chunksize = 100) to achieve that, but how would I do that using a .sql file?

Thank you for the help

Upvotes: 0

Views: 75

Answers (0)

Related Questions