PerlDuck
PerlDuck

Reputation: 5730

In Perl, how can I encode strings to a format acceptable to Python 2?

I need to print a Python-looking data structure with unicode characters with Perl and have difficulties with the encoding.

Python code:

import pprint
flavour = u'süß'  # 'sweet' in German
pprint.pprint(flavour)

# Output:
u's\xfc\xdf'

I want to produce that very same output using Perl. I know I can do

use utf8;
my $flavour = 'süß';
$flavour =~ s/ü/\\xfc/g; 
$flavour =~ s/ß/\\xdf/g; 
print "u'$flavour'\n";

# Output:
u's\xfc\xdf'

But what about the other weird characters/umlauts? Isn't there an Enconding module that would do what I want? I need this to write a Python config file with Perl.

Upvotes: 1

Views: 159

Answers (1)

Sinan Ünür
Sinan Ünür

Reputation: 118148

Based on @PM2Ring's helpful comment below:

In Python 2, those Unicode u'' strings need the \x escape sequences for codepoints from 0x80 to 0xff. They use 4 digit \u escapes for codepoints from 0x0100 to 0xffff, and 8 digit \U escapes for higher codepoints.

use utf8;
use strict;
use warnings;

use open qw(:std :utf8);

use Test::More;

my @cases = (
    [ 'süß'  => q{u's\\xfc\\xdf'} ],
    [ '╔═╗'  => q{u'\\u2554\\u2550\\u2557'} ],
    [ '𐰚𐰇𐰚'  => q{u'\\U00010c1a\\U00010c07\\U00010c1a'} ],
);

for my $case (@cases) {
    is string_to_python2_escaped($case->[0]), $case->[1], "$case->[0] maps to $case->[1]";
}

done_testing;

sub string_to_python2_escaped {
    sprintf "u'%s'", join '', map char_to_python2_escape($_), split //, $_[0];
}

sub char_to_python2_escape {
    my $c = shift;
    my $k = ord($c);

    return $c if $k <= 0x7f;
    return sprintf('\\x%02x', $k) if $k <= 0xff;
    return sprintf('\\u%04x', $k) if $k <= 0xffff;
    return sprintf('\\U%08x', $k);
}

Output:

ok 1 - süß maps to u's\xfc\xdf'
ok 2 - ╔═╗ maps to u'\u2554\u2550\u2557'
ok 3 - 𐰚𐰇𐰚 maps to u'\U00010c1a\U00010c07\U00010c1a'
1..3

Upvotes: 4

Related Questions