zelocs
zelocs

Reputation: 35

Best way to display pasted unicode characters in html?

I am trying to put together a master collection of unicode faces and symbols. However, when I paste a character into Notepad++ it doesn't recognize them. What is the best and most efficient way to display them? This site seems to use adobe flash player. Thanks.

Upvotes: 1

Views: 2759

Answers (2)

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201886

You can use the characters as such. Whether the editor (like Notepad++) can display them is immaterial to their use in HTML documents, which will be rendered by browsers.

Make sure you set the document encoding to UTF-8 and declare it. See the W3C page Character encodings.

The most difficult part is to set fonts so that the characters will be displayed in different environments. There is no single font that covers all Unicode characters. And there is no font that exists in all systems. Using downloadables font (web fonts) you can cover most rendering situations, but not for all characters using a single font. See my Guide to using special characters in HTML.

You should first consider how realistic your project is; a “master collection” is a rather ambitious goal, and such collections already exist, e.g. at FileFormat.info and at Codepoints.net – and, of course, at Unicode.org.

Upvotes: 1

Todd Patterson
Todd Patterson

Reputation: 362

Is the end-goal to display these chars (faces, symbols) on some web page (faces.html) or simply within your text file (faces.txt) ?

If you simply want to display them in a text file, you most likely need to change the encoding of your Notepad++ file. By default, new Notepad++ files are encoded in ANSI which will garble any multibyte chars that you paste. I'm using Notepad++ v5.8.7, hopefully the menus / options are not too different on your version:

  • File > New
  • Encoding > Encode in UTF-8

Then try pasting your one of your multibyte faces/symbols into the file. Assuming Notepad++ has the font necessary to display this char, it should render correctly.

However, if your end-goal is to display these chars on a web page, you should instead use the equivalent HTML "numeric character reference" instead of the raw char. In your html file, you refer to the char using a hex or decimal code. For example, here is a Japanese "kome" symbol:

  • HTML entity = &#8251

(I removed the final semicolon to prevent HTML rendering within SO page)

A "numeric character reference" refers to a char by its Unicode code point. There are various way to get the Unicode code-point. Once quick way is to copy-paste your char into the awsome search page here: http://www.fileformat.info/info/unicode/char/search.htm

That will give you all the details for the char, including its HTML numeric char ref.

Ensure you declare the charset of your webpage as UTF-8 like this:

<meta charset="utf-8">

You can test the rendering of your faces/symbols by saving the file (faces.html) to your local computer, open a new browser tab, then drag the files onto the tab and the browser should render the HTML char entities as chars.

Hope this helps.

Upvotes: 3

Related Questions