Writing unicode strings to Excel 2007

Question

I am connecting to a MS SQL server using pyodbc. Furthermore, I am trying to write to an Excel 2007/10 .xlsx file using openpyxl.

This is my code (Python 2.7):

import pyodbc
from openpyxl import Workbook

cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()

sql = "SELECT TOP 10   [customer clientcode] AS Customer, \
                [customer dchl] AS DChl, \
                [customer name] AS Name, \
                ...
                [name3] AS [name 3] \
        FROM   mydb \
        WHERE [customer dchl] = '03' \
        ORDER BY [customer id] ASC"

#load data
cursor.execute(sql)

#get colnames from openpyxl
columns = [column[0] for column in cursor.description]    

#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')

ws = wb.create_sheet()
ws.title = '03'

#append column names to header
ws.append(columns)

#append rows to 
for row in cursor:
    ws.append(row)

wb.save(filename = 'test.xlsx')

cnxn.close()

This works, at least up until the point I encounter a customer with, for example, the name: "mún". My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.

I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.

I have tried to print a single row (with a difficult cust. name). This is the result:

row is a tuple, and this one of the indices: 'Mee\xf9s Tilburg' So either writing the \xf9 (ú) character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (unicode(row,'utf-8') or u''.join(row)) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.

Any ideas?

Rym · Accepted Answer

In the end I found two solutions:

First one was converting the row given by the cursor to a list, and decoding the elements within the list:

for row in cursor:
    l = list(row)
    l[5] = l[5].decode('ISO-8859-1')
    (do this for all neccesary cols)
    ws.append(l)

I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.

Writing unicode strings to Excel 2007

Answers (2)

Related Questions