Reputation: 31
I've been self-studying shell scripting for a while now, and I came across this section of a Linux Fundamentals manual concerning grep and curly braces {}. My problem is that when I'm demanding a string pattern to search for using grep from a minimum to a maximum number of occurrences using {} or curly braces, my result exceeds the maximum that I specified.
Here is what happened:
Express11:~/unix_training/reg_ex # cat reg_file2
ll
lol
lool
loool
loooose
Express11:~/unix_training/reg_ex # grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Express11:~/unix_training/reg_ex #
When according to the manual, should not be the case as I am specifying here that I am only looking for strings containing two consecutive o's to three consecutive o's.
EDIT: Actually, the reason why I did not understand how the curly braces worked was because of this simplistic explanation by the manual. And I quote:
19.4.10. between n and m times And here we demand exactly from minimum 2 to maximum 3 times.
paul@debian7:~$ cat list2 ll lol lool loool paul@debian7:~$ grep -E 'o{2,3}' list2 lool loool paul@debian7:~$ grep 'o\{2,3\}' list2 lool loool paul@debian7:~$ cat list2 | sed 's/o\{2,3\}/A/' ll lol lAl lAl paul@debian7:~$
Thanks to all those who replied.
Upvotes: 1
Views: 9611
Reputation: 1582
You are not clear with how regex works.
The pattern o{2,3}
in grep will go through each line looking for oo
and ooo
, As long as there is a match, Grep will get you that line. Since you didn't add other rules in your pattern, What you get from grep -E 'o{2,3}' reg_file2
is correct.
I guess in your case you only want a two or three consecutive letter 'o's, Thus you will need to use regex like what Raj answesed. Matching oo
or ooo
which is neither following nor followed by the letter 'o'.
Upvotes: 2
Reputation: 174826
# grep -E 'o{2,3}' reg_file2
lool
loool
loooose
Command works perfectly, that it matches the first three o's in the last line. That's why you get also last line in the final output.
I think the command you're actually looking for is,
$ grep -P '(?<!o)o{2,3}(?!o)' file
lool
loool
Explanation:
(?<!o)
negative lookbehind which asserts that the match won't be preceded by the letter o
.
o{2,3}
Matches 2 or 3 o's.
(?!o)
Negative lookahead which asserts that the match won't be followed by the letter o
.
OR
$ grep -E '(^|[^o])o{2,3}($|[^o])' file
lool
loool
Explanation:
(^|[^o])
Matches the start of a line ^
or any character but not of o
o{2,3}
Matches 2 or 3 o's
($|[^o])
Matches the end of the line $
or any character but not of o
Upvotes: 5