Zeke
Zeke

Reputation: 89

How to write Unicode output to .csv to be used in Excel?

I have a data set containing Chinese characters that I worked on using UTF-8 is processed. Part of the data looks like this:

encod   cKeyword
UTF-8   <U+5169><U+7528> <U+5305> 27 bloide herme
UTF-8   <U+593E> <U+62C9><U+934A> <U+9577> loewe
UTF-8   <U+5169><U+7528> <U+5305> <U+8FF7><U+4F60> 31 lim pashli phillip
UTF-8   <U+5305> <U+624B><U+62FF> givenchy pandora

When I use write.csv(data, "file.csv", fileEncoding = "UTF-8") , I get a .csv file that when opened displays the exact same thing in Excel. But I need the Unicode to be displayed as its Chinese character.

How can I get it to write Chinese characters instead?

Upvotes: 2

Views: 1358

Answers (1)

ab77
ab77

Reputation: 871

Your characters are represented with unicode code points.

Python 2.7.10
>>> s = '\u5169' # <U+5169> represented in unicode
>>> print s.decode('unicode_escape')
兩

In Excel, the following function will convert your code point into character representation:

=UNICHAR(HEX2DEC(5169))

Or, here is a more end-to-end example. The following Python 2.7 code with unicodecsv (pip install unicodecsv) module, will convert your R (r.csv) output into Excel input (excel.csv):

import unicodecsv as csv, re

csvwrite = open('excel.csv', 'wb')
w = csv.writer(csvwrite, encoding='utf-8')

with open('r.csv', 'rb') as csvread:
    rows = csv.reader(csvread, delimiter='\t')
    for row in rows:
        p = re.compile('\<U\+([0-9a-fA-F]+)\>')
        iterator = p.finditer(row[1])
        for match in iterator:            
            s = '%s%s' % ('\u', match.group().replace('U+', '').replace('<', '').replace('>', ''))            
            row[1] = row[1].replace(match.group(), s.decode('unicode_escape'))
        w.writerow(row)

Take the generated excel.csv and import into Excel (not just open), but following this post.

I don't have R installed, but it may also be possible for it to write output in the format Excel understands, see this and this.

Hope this helps..

-- ab1

Upvotes: 1

Related Questions