His
His

Reputation: 6053

How to read file containing UTF-8 hex encoded characters and then decode the characters to HTML hex numbers?

I have a file containing UTF-8 hex encoded characters, as below:

<root>
<element>1 \xc3\x97 2 = 2</element>
</root>

I want to read the file and transform all the \xhh characters to the equivalent HTML hex numbers and then write to a new file. So, given a file with the above contents, the new file must look like:

<root>
<element>1 &#xd7; 2 = 2</element>
</root>

Thanks!

Upvotes: 0

Views: 712

Answers (1)

tchrist
tchrist

Reputation: 80443

Assuming you’ve used :utf8 on the input stream, then this will fix the data:

s/([^\x00-\x7F])/sprintf "&#x%x;", ord $1/ge;

Upvotes: 2

Related Questions