Reputation: 303
I have following regexp to check XML element names.
my $NameStartChar = ':A-Z_a-z\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{2FF}\x{0370}-\x{037D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}';
my $NameChar = ':A-Z_a-z\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}\-\.0-9\x{B7}\x{0300}-\x{036F}\x{203F}-\x{2040}';
sub checkXmlName ($)
# Check if input is valid XML name
# $arg - Input string
# $ret - Boolean of validity
{
if ($_[0] =~ m/^[$NameStartChar]([$NameChar])*$/)
{ return 1; }
else
{ return ""; }
}
if (checkXmlName("foo"))
{
print STDOUT "OK";
}
Which gives convenient error
Invalid [] range "\x{F8}-\x{2FF}" in regex; marked by <-- HERE in m/^[:A-Z_a-z\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{2FF} <-- HERE
On Perl 5.16.2 I am using \N{U+2FF}
form of characters. But I'm mandatory to use 5.8.8.
EDIT:
Changed qw to qr which doesn't changed error.
and added Unicode character 0xeffff is illegal at ...
EDIT: from ikegami comment Removed qr/ Which eliminated ilegal character error.
Upvotes: 0
Views: 1503
Reputation: 385847
[\x{F8}-\x{2FF}]
should work, so this is a bug in Perl.
[\x{F8}-\x{2FF}]
does work in newer versions of Perl, so this bug has been fixed.
Looks like the regex engine has problems with ranges that span from single-byte chars to larger chars, so why don't you try splitting the range into two.
[\x{F8}-\x{FF}\x{100}-\x{2FF}]
Upvotes: 2