Reputation:
I'm running Ubuntu 8.04 and my code looks like this...
for (i=1;i<=n;i++)
{
if (arr[i] ~ /^[A-Z]{2,4}$/) printf(arr[i])
}
I quickly discovered that the {n} expression won't work in gawk without the --posix switch. Once enabled the expression works but it is case-insenitive matching AAAA and aaaa. What is going on here?
Upvotes: 0
Views: 981
Reputation: 28000
Otherwise, if you're using GNU awk, you could use the [:upper:] alphabetic character class.
% awk '{print /[:upper:]/?"OK":"KO"}'
AA
KO
aa
KO
Upvotes: 0
Reputation: 47762
The expression itself works for me:
dfs:~# gawk --posix '/^[A-Z]{2,4}$/ {print "Yes"}'
AAAA
Yes
AA
Yes
TT
Yes
tt
YY
Yes
yy
Your problems may be caused by two things. Either you accidentally set the IGNORECASE
awk variable or otherwise turned of case insensitive operation (BTW IGNORECASE
doesn't work with --posix
, but does with --re-interval
, which enables the braces in regular expressions too), or it is a classic problem of locale's collating sequence (because gawk does locale aware character comparison), which means the lowercase characters compare between some uppercase characters. Quote from the relevant part of the manual:
Many locales sort characters in dictionary order, and in these locales, ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; instead it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value ‘C’.
Upvotes: 5
Reputation: 51
I only have mawk installed, but maybe this is what your looking for?
for (i=1;i<=n;i++) { if (arr[i] ~ [^A-Z]{2,4}$/) printf(arr[i]) }
Upvotes: 0