CHANist
CHANist

Reputation: 1412

Reading CSV file with Chinese Character [One character cannot be shown]

When I am opening a csv file containing Chinese characters, using Microsoft Excel, TextWrangler and Sublime Text, there are some Chinese words, which cannot be displayed properly. I have no ideas why this is the case.

Specifically, the csv file can be found in the following link: https://www.hkex.com.hk/eng/plw/csv/List_of_Current_SEHK_EP.CSV

One of the word that cannot be displayed correctly is shown here: enter image description here

As you can see a ? can be found.

Using mac file command as suggested by http://osxdaily.com/2015/08/11/determine-file-type-encoding-command-line-mac-os-x/ tell me that the csv format is utf-16le.

I am wondering what's the problem, why I cannot read that specific text? Is it related to encoding? Or is it related to my laptop setting? Trying to use Mac and windows 10 on Mac (via Parallel Desktop) cannot display the work correctly.

Thanks for the help. I really want to know why this specific text cannot be displayed properly.

Upvotes: 2

Views: 862

Answers (1)

bobince
bobince

Reputation: 536755

The actual name of HSBC Broking Securities is:

滙豐金融證券(香港)有限公司

The first character, U+6ED9 , is one of the troublesome HKSCS characters: characters that weren't available in standard pre-Unicode Big-5, which were grafted on in incompatible ways later.

For a while there was an unfortunate convention of converting these characters into Private Use Area characters when converting to Unicode. This data was presumably converted back then and is now mangled, replacing with U+E05E Private Use Area Character.

For PUA cases that you're sure are the result of HKSCS-compatibility-bodge, you can convert back to proper Unicode using this table.

Upvotes: 3

Related Questions