user3105222
user3105222

Reputation:

How to properly encode html entities in emails? e.g. ↗ for Gmail

So I modified some emails I send to get rid of images and replace them by special unicode characters. For example I had an arrow image and replaced it with &nearr; while wrapping it in a <span> to give it the color I want.

When I look at the source in Gmail (3 dots > Show Original) I see this:

...
--1234567890123456789012345678
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8" />
</head>
<body>
...
...&nbsp;<span style=3D"font-family:arial,verdana;font-weight:bold;color:#209a20">&nearr;</span>&nbsp;...
...
</body>
</html>
--1234567890123456789012345678--

Which is what I'd expect since that's what I wrote in my code.

Now the problem is that it displays like this in the Gmail web interface: Gmail interface shows ↗

What am I doing wrong? Isn't UTF-8 a unicode encoding that should support this character?

I would understand if some of these special characters are displayed as square boxes or something, but I do not understand how they can remain encoded while the &nbsp; turns into a space correctly.

It also makes me question whether other email clients will display these correctly (would love feedback on that too).

Upvotes: 1

Views: 3357

Answers (2)

Nathan
Nathan

Reputation: 5259

Try the HTML code rather than the HTML entity.

So &#8599; for the north east arrow, as per https://www.toptal.com/designers/htmlarrows/arrows/north-east-arrow/

Best reference for this is usually https://unicode-table.com/en/

Upvotes: 0

Rick James
Rick James

Reputation: 142433

In the 1950's computers could handle only capital letters, digits and some punctuation.

Before 1970, EBCDIC was invented (only to later die out) for handling lower case and a few more punctuation characters.

Then came a plethora of encodings to handle European accents, Cyrillic, Greek, and eventually Chinese. (There are some interesting stories on the invention of typewriters for handling Chinese!)

Eventually, the Unicode group got together and slowly created a universal standard. It has been evolving for a few decades and continues to enhance it -- emojis are a big addition that is ongoing.

But, meanwhile, how does one put Emoji, etc, in URLs, type them on a keyboard, etc, etc? Those standards are lagging way behind. So, there are kludges in place.

  • HTML allows "entities", such as &nearr; for that arrow.
  • Putting such in a URL would require something like %E2%86%97.
  • Several encodings also base their kludge on the hex encoding of the utf8.
  • Unicode allows \U8599 which is based on the decimal value of the "codepoint". (I think Java goes that direction.)
  • MySQL INSERT: UNHEX('E28697')
  • Keyboards -- good luck.

I don't know of anything other than HTML that reacts favorably to &nearr;

Ever notice a + in a URL? That is the encoding for a single space. (Also %20 works there.)

Upvotes: 0

Related Questions