Eugene Barsky
Eugene Barsky

Reputation: 6002

s/// and combined diacritics in Perl 6

Another side of this problem. When I try to substitute a part of a combined character, Perl 6 by default wouldn't split it.

my $p_macron = "p" ~ 0x0304.chr; 
say $p_macron; # "p̄" 
(my $a_macron = $p_macron) ~~ s/p/a/;
say $a_macron; # OOPS, again "p̄"

How to (temporarily) switch off this default, to be able to match a single Unicode symbol, not a combined one? Here is how it is done in bash.

$ echo p̄ | sed 's/p/a/'
ā

Upvotes: 3

Views: 82

Answers (1)

Brad Gilbert
Brad Gilbert

Reputation: 34120

sed doesn't work on Unicode codepoints, it works on bytes so when it is given it sees 0x<0070 0304> whereas Perl 6 properly sees it as one grapheme, and treats it as such. Which means s/p/a/ would do absolutely nothing as p doesn't match .

You could have tried s:ignoremark/p/a/ (:m) which would have given you a,
or s:samemark/p/a/ (:mm) which would have given you ā

Upvotes: 4

Related Questions