user1008697
user1008697

Reputation: 1151

pandas with xlsxwriter encoding caused 'u' prefix to dataframe text

I am using xlsxwriter to populate dataframe data to excel via the pandas to_excel function. However, the data populated has the prefix 'u' infront of the text displayed in excel.

Output in Excel: [u'05-Sep-17'] [u'05-Sep-17']

I understand that 'u' is referring to unicode, are there any means to remove the 'u' prefix?

I have tried the following but without any success so far.

1) Handle it via regex before calling the func to_excel()

df.loc[:, 'delivery_dt'].replace({r'[^\x00-\x7F]+':''}, regex=True, inplace=True)

The print statement gave the result I wanted but when it reaches the to_excel() stage, unicode 'u' prefix appeared in the final output in excel.

Upvotes: 1

Views: 286

Answers (1)

Marjan Moderc
Marjan Moderc

Reputation: 2859

I tried to reproduce your problem, but with no success. Do you mind sharing any more detail about the source of data you are importing?

If your strings are in unicode originally and you use .to_excel('test.xls') function, it should work with no problem. If you want to force a special encoding in Excel, use an argument .to_excel('test.xls', encoding='utf16').

So, make sure that you know in what encoding is your original data (e.g. your csv files or your database). When loading to Python, always try to decode data in unicode as soon as possible. Then it should work out of the box.

df['delivery_dt'] = df['delivery_dt'].apply(lambda x: unicode(x))

Again, as always, using Python 3 makes your life much easier when it comes to non-ascii characters.

Upvotes: 1

Related Questions