Reputation: 89
I have a data set containing Chinese characters that I worked on using UTF-8 is processed. Part of the data looks like this:
encod cKeyword
UTF-8 <U+5169><U+7528> <U+5305> 27 bloide herme
UTF-8 <U+593E> <U+62C9><U+934A> <U+9577> loewe
UTF-8 <U+5169><U+7528> <U+5305> <U+8FF7><U+4F60> 31 lim pashli phillip
UTF-8 <U+5305> <U+624B><U+62FF> givenchy pandora
When I use write.csv(data, "file.csv", fileEncoding = "UTF-8")
, I get a .csv file that when opened displays the exact same thing in Excel. But I need the Unicode to be displayed as its Chinese character.
How can I get it to write Chinese characters instead?
Upvotes: 2
Views: 1358
Reputation: 871
Your characters are represented with unicode code points.
Python 2.7.10
>>> s = '\u5169' # <U+5169> represented in unicode
>>> print s.decode('unicode_escape')
兩
In Excel, the following function will convert your code point into character representation:
=UNICHAR(HEX2DEC(5169))
Or, here is a more end-to-end example. The following Python 2.7 code with unicodecsv
(pip install unicodecsv
) module, will convert your R (r.csv
) output into Excel input (excel.csv
):
import unicodecsv as csv, re
csvwrite = open('excel.csv', 'wb')
w = csv.writer(csvwrite, encoding='utf-8')
with open('r.csv', 'rb') as csvread:
rows = csv.reader(csvread, delimiter='\t')
for row in rows:
p = re.compile('\<U\+([0-9a-fA-F]+)\>')
iterator = p.finditer(row[1])
for match in iterator:
s = '%s%s' % ('\u', match.group().replace('U+', '').replace('<', '').replace('>', ''))
row[1] = row[1].replace(match.group(), s.decode('unicode_escape'))
w.writerow(row)
Take the generated excel.csv
and import into Excel (not just open), but following this post.
I don't have R installed, but it may also be possible for it to write output in the format Excel understands, see this and this.
Hope this helps..
-- ab1
Upvotes: 1