0-2
0-2

Reputation:

AWK - My regexp won't respect case

I'm running Ubuntu 8.04 and my code looks like this...

 for (i=1;i<=n;i++)
 {
  if (arr[i] ~ /^[A-Z]{2,4}$/) printf(arr[i])
 }

I quickly discovered that the {n} expression won't work in gawk without the --posix switch. Once enabled the expression works but it is case-insenitive matching AAAA and aaaa. What is going on here?

Upvotes: 0

Views: 981

Answers (3)

Dimitre Radoulov
Dimitre Radoulov

Reputation: 28000

Otherwise, if you're using GNU awk, you could use the [:upper:] alphabetic character class.

% awk '{print /[:upper:]/?"OK":"KO"}'
AA
KO
aa
KO

Upvotes: 0

jpalecek
jpalecek

Reputation: 47762

The expression itself works for me:

dfs:~# gawk --posix '/^[A-Z]{2,4}$/ {print "Yes"}'
AAAA
Yes
AA
Yes
TT
Yes
tt
YY
Yes
yy

Your problems may be caused by two things. Either you accidentally set the IGNORECASE awk variable or otherwise turned of case insensitive operation (BTW IGNORECASE doesn't work with --posix, but does with --re-interval, which enables the braces in regular expressions too), or it is a classic problem of locale's collating sequence (because gawk does locale aware character comparison), which means the lowercase characters compare between some uppercase characters. Quote from the relevant part of the manual:

Many locales sort characters in dictionary order, and in these locales, ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; instead it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value ‘C’.

Upvotes: 5

pseudosaint
pseudosaint

Reputation: 51

I only have mawk installed, but maybe this is what your looking for?

for (i=1;i<=n;i++) { if (arr[i] ~ [^A-Z]{2,4}$/) printf(arr[i]) }

Upvotes: 0

Related Questions