Reputation: 5730
I need to print a Python-looking data structure with unicode characters with Perl and have difficulties with the encoding.
Python code:
import pprint
flavour = u'süß' # 'sweet' in German
pprint.pprint(flavour)
# Output:
u's\xfc\xdf'
I want to produce that very same output using Perl. I know I can do
use utf8;
my $flavour = 'süß';
$flavour =~ s/ü/\\xfc/g;
$flavour =~ s/ß/\\xdf/g;
print "u'$flavour'\n";
# Output:
u's\xfc\xdf'
But what about the other weird characters/umlauts? Isn't there an Enconding module that would do what I want? I need this to write a Python config file with Perl.
Upvotes: 1
Views: 159
Reputation: 118148
Based on @PM2Ring's helpful comment below:
In Python 2, those Unicode
u''
strings need the\x
escape sequences for codepoints from0x80
to0xff
. They use 4 digit\u
escapes for codepoints from0x0100
to0xffff
, and 8 digit\U
escapes for higher codepoints.
use utf8;
use strict;
use warnings;
use open qw(:std :utf8);
use Test::More;
my @cases = (
[ 'süß' => q{u's\\xfc\\xdf'} ],
[ '╔═╗' => q{u'\\u2554\\u2550\\u2557'} ],
[ '𐰚𐰇𐰚' => q{u'\\U00010c1a\\U00010c07\\U00010c1a'} ],
);
for my $case (@cases) {
is string_to_python2_escaped($case->[0]), $case->[1], "$case->[0] maps to $case->[1]";
}
done_testing;
sub string_to_python2_escaped {
sprintf "u'%s'", join '', map char_to_python2_escape($_), split //, $_[0];
}
sub char_to_python2_escape {
my $c = shift;
my $k = ord($c);
return $c if $k <= 0x7f;
return sprintf('\\x%02x', $k) if $k <= 0xff;
return sprintf('\\u%04x', $k) if $k <= 0xffff;
return sprintf('\\U%08x', $k);
}
Output:
ok 1 - süß maps to u's\xfc\xdf'
ok 2 - ╔═╗ maps to u'\u2554\u2550\u2557'
ok 3 - 𐰚𐰇𐰚 maps to u'\U00010c1a\U00010c07\U00010c1a'
1..3
Upvotes: 4