Chas. Owens
Chas. Owens

Reputation: 64939

Is there any way to make printf/sprintf handle combining characters correctly?

Combining characters appear to count as whole characters in printf and sprintf's calculations:

[    é]
[   é]

The text above was created by the following code:

#!/usr/bin/perl

use strict;
use warnings;

binmode STDOUT, ":utf8";

for my $s ("\x{e9}", "e\x{301}") {
        printf "[%5s]\n", $s; 
}

I expected the code to print:

[    é]
[    é]

I don't see any discussion of Unicode, let alone combining characters, in the function descriptions. Are printf and sprintf useless in the face of Unicode? Is this just a bug in Perl 5.20.1 that could be fixed? Is there a replacement someone has written?

Upvotes: 4

Views: 200

Answers (2)

tjd
tjd

Reputation: 4104

You should probably be aware of the Perl Unicode Cookbook. In particular ℞ #34, which deals with this very issue. As a bonus, Perl v5.20.2 has it available as perldoc unicook.

In any case: The code included in that article is as follows:

use Unicode::GCString;
use Unicode::Normalize;

my @words = qw/crème brûlée/;
@words    = map { NFC($_), NFD($_) } @words;

for my $str (@words) {
    my $gcs  = Unicode::GCString->new($str);
    my $cols = $gcs->columns;
    my $pad  = " " x (10 - $cols);
    say str, $pad, " |";
}

Upvotes: 2

Chas. Owens
Chas. Owens

Reputation: 64939

It looks like the answer is to use Unicode::GCString

#!/usr/bin/perl

use strict;
use warnings;

use Unicode::GCString;

binmode STDOUT, ":utf8";

for my $s ("\x{e9}", "e\x{301}", "e\x{301}\x{302}") {
        printf "[%s]\n", pad($s, 5);
}

sub pad {
        my ($s, $length) = @_;
        my $gcs = Unicode::GCString->new($s);
        return((" " x ($length - $gcs->columns)) . $s);
}

Upvotes: 4

Related Questions