Reputation: 6002
Another side of this problem. When I try to substitute a part of a combined character, Perl 6 by default wouldn't split it.
my $p_macron = "p" ~ 0x0304.chr;
say $p_macron; # "p̄"
(my $a_macron = $p_macron) ~~ s/p/a/;
say $a_macron; # OOPS, again "p̄"
How to (temporarily) switch off this default, to be able to match a single Unicode symbol, not a combined one?
Here is how it is done in bash
.
$ echo p̄ | sed 's/p/a/'
ā
Upvotes: 3
Views: 82
Reputation: 34120
sed
doesn't work on Unicode codepoints, it works on bytes so when it is given p̄
it sees 0x<0070 0304>
whereas Perl 6 properly sees it as one grapheme, and treats it as such. Which means s/p/a/
would do absolutely nothing as p
doesn't match p̄
.
You could have tried s:ignoremark/p/a/
(:m
) which would have given you a
,
or s:samemark/p/a/
(:mm
) which would have given you ā
Upvotes: 4