Christos Hayward
Christos Hayward

Reputation: 5993

How can I transcode US ASCII HTML's e.g. é to UTF-8's é undere Linux

What a quick web search will confirm that US ASCII is a subset of UTF-8, but what I've not yet found is how to convert &foo; and { to their corresponding native UTF-8 characters.

I know that at least 7-bit US ASCII is unchanged in UTF-8, but I haven't seen yet a program to filter through and convert &foo; to how it would naturally be expressed in UTF-8.

Upvotes: 0

Views: 293

Answers (1)

MichaEL
MichaEL

Reputation: 94

You can use html_entity_decode(s, "UTF-8") in PHP or html.unescape(s) in Python.

  1. https://www.php.net/manual/en/function.html-entity-decode.php
  2. https://docs.python.org/3/library/html.html#html.unescape

Upvotes: 1

Related Questions