How to write Unicode output to .csv to be used in Excel?

Question

I have a data set containing Chinese characters that I worked on using UTF-8 is processed. Part of the data looks like this:

encod   cKeyword
UTF-8     27 bloide herme
UTF-8      loewe
UTF-8      31 lim pashli phillip
UTF-8     givenchy pandora

When I use write.csv(data, "file.csv", fileEncoding = "UTF-8") , I get a .csv file that when opened displays the exact same thing in Excel. But I need the Unicode to be displayed as its Chinese character.

How can I get it to write Chinese characters instead?

ab77 · Accepted Answer

Your characters are represented with unicode code points.

Python 2.7.10
>>> s = '\u5169' #  represented in unicode
>>> print s.decode('unicode_escape')
兩

In Excel, the following function will convert your code point into character representation:

=UNICHAR(HEX2DEC(5169))

Or, here is a more end-to-end example. The following Python 2.7 code with unicodecsv (pip install unicodecsv) module, will convert your R (r.csv) output into Excel input (excel.csv):

import unicodecsv as csv, re

csvwrite = open('excel.csv', 'wb')
w = csv.writer(csvwrite, encoding='utf-8')

with open('r.csv', 'rb') as csvread:
    rows = csv.reader(csvread, delimiter='	')
    for row in rows:
        p = re.compile('\')
        iterator = p.finditer(row[1])
        for match in iterator:            
            s = '%s%s' % ('\u', match.group().replace('U+', '').replace('<', '').replace('>', ''))            
            row[1] = row[1].replace(match.group(), s.decode('unicode_escape'))
        w.writerow(row)

Take the generated excel.csv and import into Excel (not just open), but following this post.

I don't have R installed, but it may also be possible for it to write output in the format Excel understands, see this and this.

Hope this helps..

-- ab1

How to write Unicode output to .csv to be used in Excel?

Answers (1)

Related Questions