Reputation: 8940
I am trying to export a table from pandas to a Microsoft SQL Server Express database.
Pandas reads a CSV file encodes as utf8. If I do df.head(), I can see that pandas shows the foreign characters correctly (they're Greek letters)
However, after exporting to SQL, those characters appear as combinations of question marks and zeros.
What am I doing wrong?
I can't find that to_sql() has any option to set the encoding. I guess I must change the syntax when setting up the SQL engine, but how exactly?
This is what I have been trying:
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
import sqlalchemy as sqlalchemy
ServerName = my_server_name
Database = my_database
params = '?driver=SQL+Server+Native+Client+11.0'
engine = create_engine('mssql+pyodbc://' + ServerName + '/'+ Database + params, encoding ='utf_8', fast_executemany=True )
connection = engine.raw_connection()
cursor = connection.cursor()
file_name = my_file_name
df = pd.read_csv(file_name, encoding='utf_8', na_values=['null','N/A','n/a', ' ','-'] , dtype = field_map, thousands =',' )
print(df[['City','Municipality']].head()) # This works
Upvotes: 1
Views: 3269
Reputation: 66
In df.to_sql specify type for this columns. Use this
dtype= {'column_name1': sqlalchemy.NVARCHAR(length=50), 'column_name2': sqlalchemy.types.NVARCHAR(length=70)}
Upvotes: 0
Reputation: 8940
Combining Lamu's comments and these answers:
pandas to_sql all columns as nvarchar
write unicode data to mssql with python?
I have come up with the code below, which works. Basically, when running to_sql, I export all the object columns as NVARCHAR. This is fine in my specific example, because all the dates are datetime and not object, but could be messy in those cases where dates are stored as object.
Any suggestions on how to handle those cases, too?
from sqlalchemy.types import NVARCHAR
txt_cols = df.select_dtypes(include = ['object']).columns
df.to_sql(output_table, engine, schema='dbo', if_exists='replace', index=False, dtype = {col_name: NVARCHAR for col_name in txt_cols}
PS Note I don't see this answer as a duplicate of the others; there are some differences, like the use of df.select.dtypes
Upvotes: 5