Alex Huszagh
Alex Huszagh

Reputation: 14644

Pandas UTF-8 Write to Excel Issue on Windows Only

So I'm having an issue with Python 2.7 and the ExcelWriter in Pandas (furnished, I believe, by OpenPyXl).

Basically, on a Linux or Mac OS X system, I have had no issues with my code, but on Windows, I've tried multiple options. After loading (using 'utf-8' encoding) the data into the array, doing a variety of calculations, the data writes to CSV perfectly fine, but will not work with the ExcelWriter.

Here's a test example that exemplifies my problem. First I make the dataframe (loaded, but I've verified that my actual code has the properly encoded data right before the print step).

import pandas as pd
df = pd.DataFrame(index=range(2), columns=range(2))
df.loc[1,1] = 'α'

CSV Save Attempt:

df.to_csv('test.csv')

Excel Save Attempt 1:

ew = pd.ExcelWriter('test.xlsx')
df.to_excel(ew, 'test')
ew.save()

Excel Save Attempt 2:

ew = pd.ExcelWriter('test.xlsx', options={'encoding':'utf-8'})
df.to_excel(ew, 'test')
ew.save()

In both cases, all 3 save attempts work on a Ubuntu install and Mac OS X operating system (Yosemite), but only the Excel writing does not work on a Windows 7 install with Python 2.7. I've tried it with OpenPyXl versions 1.7.0 and 1.8.6, Pandas versions 0.15.0 and 0.15.2 on Ubuntu (Max OS X is less important), and OpenPyXl versions 1.7.0 and Pandas version 0.14.1 on Windows.

It seems to be OpenPyXl independent and independent of Pandas, so is there something obvious I'm overlooking? Is this a glitch? It has a similar issue on my friend's Windows 7 install with similar installs.

Both Windows installs have the SciPy dependencies installed via the SciPy Win32 superpack for Python 2.7.

Upvotes: 0

Views: 2205

Answers (1)

Alex Huszagh
Alex Huszagh

Reputation: 14644

I managed to solve it with the sys reload method. Appending this code at the top worked:

import sys
reload(sys)
sys.setdefaultencoding('UTF8')

This is explained in more depth here: https://stackoverflow.com/a/17628350/4131059 Manually setting the encoding for the dataframe does nothing for the IO, so setting the default encoding was the logical next step. And it worked.

Upvotes: 1

Related Questions