Reputation: 64939
Combining characters appear to count as whole characters in printf
and sprintf
's calculations:
[ é]
[ é]
The text above was created by the following code:
#!/usr/bin/perl
use strict;
use warnings;
binmode STDOUT, ":utf8";
for my $s ("\x{e9}", "e\x{301}") {
printf "[%5s]\n", $s;
}
I expected the code to print:
[ é]
[ é]
I don't see any discussion of Unicode, let alone combining characters, in the function descriptions. Are printf
and sprintf
useless in the face of Unicode? Is this just a bug in Perl 5.20.1 that could be fixed? Is there a replacement someone has written?
Upvotes: 4
Views: 200
Reputation: 4104
You should probably be aware of the Perl Unicode Cookbook. In particular ℞ #34, which deals with this very issue. As a bonus, Perl v5.20.2 has it available as perldoc unicook
.
In any case: The code included in that article is as follows:
use Unicode::GCString;
use Unicode::Normalize;
my @words = qw/crème brûlée/;
@words = map { NFC($_), NFD($_) } @words;
for my $str (@words) {
my $gcs = Unicode::GCString->new($str);
my $cols = $gcs->columns;
my $pad = " " x (10 - $cols);
say str, $pad, " |";
}
Upvotes: 2
Reputation: 64939
It looks like the answer is to use Unicode::GCString
#!/usr/bin/perl
use strict;
use warnings;
use Unicode::GCString;
binmode STDOUT, ":utf8";
for my $s ("\x{e9}", "e\x{301}", "e\x{301}\x{302}") {
printf "[%s]\n", pad($s, 5);
}
sub pad {
my ($s, $length) = @_;
my $gcs = Unicode::GCString->new($s);
return((" " x ($length - $gcs->columns)) . $s);
}
Upvotes: 4