user6572950
user6572950

Reputation:

Convert non-ASCII/UTF-8 characters into LaTeX codes

We have to convert non-ASCII, UTF-8, or named entity characters into LaTeX codes. Now we are using non-ASCII to Unicode, then Unicode to LaTeX/entity using a Perl script.

For example:

 ó --> \'{o}
 ó --> \'{o}
 ó --> \'{o}

Is there any direct conversion from non-ASCII, or UTF-8 to LaTeX codes in Perl program/script?

Upvotes: 2

Views: 1517

Answers (2)

Thomas Lorentz
Thomas Lorentz

Reputation: 1

I needed about 2 hours to find my mistake. Perl did not detect my input string being coded in UTF-8.

You can help perl to treat the input string as utf-8 with: use open ( ":encoding(UTF-8)", ":std" );

Upvotes: -2

Borodin
Borodin

Reputation: 126722

This is very straightforward using the XML::Entities module to decode the entities, and the LaTeX::Encode module to re-encode them as LaTeX

Note that I've explicitly created an alias xml_decode for the decoding function, as the exported name is just decode, which is far too imprecise

use utf8;
use strict;
use warnings 'all';
use feature 'say';

use XML::Entities ();
use LaTeX::Encode 'latex_encode';
*xml_decode = \&XML::Entities::decode;

for my $s ( 'ó', 'ó', 'ó' ) {
    my $reencoded = latex_encode(xml_decode('all', $s));
    say $reencoded;
}

output

{\'o}
{\'o}
{\'o}

Upvotes: 3

Related Questions