Reputation: 63902
Have this script:
use 5.014;
use warnings;
use utf8;
binmode STDOUT, ':utf8';
my $str = "XYZ ΦΨΩ zyz φψω";
my @greek = ($str =~ /\p{Greek}/g);
say "Greek: @greek";
my @upper = ($str =~ /\p{Upper}/g);
say "Upper: @upper";
#my @upper_greek = ($str =~ /\p{Upper+Greek}/); #wrong.
#say "Upper+Greek: @upper_greek";
Is possible combine multiple unicode properties? E.g how to select only Upper and Greek
, and get the wanted:
Greek: Φ Ψ Ω φ ψ ω
Upper: X Y Z Φ Ψ Ω
Upper+Greek: Φ Ψ Ω #<-- how to get this?
Upvotes: 9
Views: 313
Reputation: 385754
We want to perform an AND operation, so we can't use
/(?:\p{Greek}|\p{Upper})/ # Greek OR Upper
or
/[\p{Greek}\p{Upper}]/ # Greek OR Upper
Since 5.18, one can use regex sets.
/(?[ \p{Greek} & \p{Upper} ])/ # Greek AND Upper
This requires use experimental qw( regex_sets );
before 5.36. But it's safe to add this and use the feature as far back as its introduction as an experimental feature in 5.18, since no change was made to the feature since then.
There are some other approaches that can be used in older versions of Perl, but they are indisputably harder to read.
One way of achieving AND in a regex is using lookarounds.
/\p{Greek}(?<=\p{Upper})/ # Greek AND Upper
Another way of getting an AND is to negate an OR. De Morgan's laws tells us
NOT( Greek AND Upper ) ⇔ NOT(Greek) OR NOT(Upper)
so
Greek AND Upper ⇔ NOT( NOT(Greek) OR NOT(Upper) )
This gives us
/[^\P{Greek}\P{Upper}]/ # Greek AND Upper
This is more efficient then using a lookbehind.
Upvotes: 12
Reputation: 22254
This works in 5.14.0 as well:
sub InUpperGreek {
return <<'END'
+utf8::Greek
&utf8::Upper
END
}
my @upper_greek = ($str =~ /\p{InUpperGreek}/g);
say "Upper Greek: @upper_greek";
Not sure if that's simpler. :) For more information on how this works, see the perlunicode documentation on user-defined character properties.
Upvotes: 7