Reputation: 8970

How to export UTF8 characters from pandas to MS SQL

I am trying to export a table from pandas to a Microsoft SQL Server Express database.

Pandas reads a CSV file encodes as utf8. If I do df.head(), I can see that pandas shows the foreign characters correctly (they're Greek letters)

However, after exporting to SQL, those characters appear as combinations of question marks and zeros.

What am I doing wrong?

I can't find that to_sql() has any option to set the encoding. I guess I must change the syntax when setting up the SQL engine, but how exactly?

This is what I have been trying:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
import sqlalchemy as sqlalchemy


ServerName = my_server_name
Database = my_database
params = '?driver=SQL+Server+Native+Client+11.0'
engine = create_engine('mssql+pyodbc://' + ServerName + '/'+ Database + params, encoding ='utf_8', fast_executemany=True )
connection = engine.raw_connection()
cursor = connection.cursor()

file_name = my_file_name

df = pd.read_csv(file_name, encoding='utf_8', na_values=['null','N/A','n/a', ' ','-']  , dtype = field_map, thousands =',' )

print(df[['City','Municipality']].head())  # This works

Upvotes: 1

Answers (2)

MR.Max

Reputation: 66

In df.to_sql specify type for this columns. Use this

dtype= {'column_name1': sqlalchemy.NVARCHAR(length=50), 'column_name2': sqlalchemy.types.NVARCHAR(length=70)}

Upvotes: 0

Pythonista anonymous

Reputation: 8970

Combining Lamu's comments and these answers:

pandas to_sql all columns as nvarchar

write unicode data to mssql with python?

I have come up with the code below, which works. Basically, when running to_sql, I export all the object columns as NVARCHAR. This is fine in my specific example, because all the dates are datetime and not object, but could be messy in those cases where dates are stored as object.

Any suggestions on how to handle those cases, too?

from sqlalchemy.types import NVARCHAR
txt_cols = df.select_dtypes(include = ['object']).columns
df.to_sql(output_table, engine, schema='dbo', if_exists='replace', index=False, dtype = {col_name: NVARCHAR for col_name in txt_cols}

PS Note I don't see this answer as a duplicate of the others; there are some differences, like the use of df.select.dtypes

Upvotes: 5

How to export UTF8 characters from pandas to MS SQL

Answers (2)

Related Questions