Reputation: 1
I am using GNU grep 2.6.3 on Ubuntu 10.10 and am brushing up on my regex skills in preparation for an upcoming training course and am getting an unexpected hit on the following.
I have a file named strings.regex.txt with the following content:
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)
This grep command:
grep 'x[0-9A-Z]' strings.regex.txt
Results in:
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)
I expected this as the result:
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)
Can anyone explain why I am getting the above result? The first line of the grep output does not contain a match of the regular expression x[0-9A-Z] . It would have matched on x[0-9a-z] or on x[0-9A-Za-z] or a number of other regular expressions. The way I learned and understand regular expressions. It should not have matched.
Here are some additional grep commands and the resulting output:
grep -o 'x[0-9A-Z]' strings.regex.txt
x2
(I expected this and it supports my current understanding of regular expressions.)
grep -oc 'x[0-9A-Z]' strings.regex.txt
2
(I did not expect this. I expected 1.)
grep -c 'x[0-9A-Z]' strings.regex.txt
2
(I did not expect this. I expected 1.)
Upvotes: 0
Views: 261
Reputation: 45670
Add a LC_ALL=C
before the grep command, i.e.
$ grep -c 'x[0-9A-Z]' strings.regex.txt
2
$ LC_ALL=C grep -c 'x[0-9A-Z]' strings.regex.txt
1
From grep man-page
LC_ALL
LC_COLLATE
LANG
These variables specify the locale for the LC_COLLATE category,
which determines the collating sequence used to interpret range
expressions like ‘[a-z]’.
Upvotes: 1