Reputation: 143
I am connecting to a MS SQL server using pyodbc
. Furthermore, I am trying to write to an Excel 2007/10 .xlsx
file using openpyxl
.
This is my code (Python 2.7):
import pyodbc
from openpyxl import Workbook
cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()
sql = "SELECT TOP 10 [customer clientcode] AS Customer, \
[customer dchl] AS DChl, \
[customer name] AS Name, \
...
[name3] AS [name 3] \
FROM mydb \
WHERE [customer dchl] = '03' \
ORDER BY [customer id] ASC"
#load data
cursor.execute(sql)
#get colnames from openpyxl
columns = [column[0] for column in cursor.description]
#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')
ws = wb.create_sheet()
ws.title = '03'
#append column names to header
ws.append(columns)
#append rows to
for row in cursor:
ws.append(row)
wb.save(filename = 'test.xlsx')
cnxn.close()
This works, at least up until the point I encounter a customer with, for example, the name: "mún"
. My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.
I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.
I have tried to print a single row (with a difficult cust. name). This is the result:
row
is a tuple, and this one of the indices: 'Mee\xf9s Tilburg'
So either writing the \xf9 (ú)
character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (unicode(row,'utf-8')
or u''.join(row)
) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.
Any ideas?
Upvotes: 7
Views: 4981
Reputation: 7833
You can use encode()
to convert unicode to string:
l=[u'asd',u'qw',u'fdf',u'sad',u'sadasd']
l[4]=l[4].encode('utf8')
Upvotes: -1
Reputation: 143
In the end I found two solutions:
First one was converting the row given by the cursor to a list, and decoding the elements within the list:
for row in cursor:
l = list(row)
l[5] = l[5].decode('ISO-8859-1')
(do this for all neccesary cols)
ws.append(l)
I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.
Upvotes: 5