Reputation: 7575
Let's say I wanted to find all 12-letter words in /usr/share/dict/words
that started with c
and ended with er
. Off the top of my head, a workable pattern could look something like:
grep -E '^c.{9}er$' /usr/share/dict/words
It finds:
cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...
But that .{9}
bothers me. It feels too magical, subtracting the total length of all the anchor characters from the number defined in the original constraint.
Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12
to be used directly in the pattern?
Upvotes: 3
Views: 97
Reputation: 70750
You can use the -x
option which selects only matches that exactly match the whole line.
grep -xE '.{12}' | grep 'c.*er'
Or use the -P
option which clarifies the pattern as a Perl regular expression and use a lookahead assertion.
grep -P '^(?=.{12}$)c.*er$'
Upvotes: 2
Reputation: 85883
One approach with GNU sed
:
$ sed -nr '/^.{12}$/{/^c.*er$/p}' words
With BSD sed
(Mac OS) it would be:
$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words
Upvotes: 0
Reputation: 757
I don't know grep
so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. If you can figure out any means to make those available for you, you could write:
^(?=c).{12}(?<=er)$
Maybe as a perl
one-liner like this?
cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"
Upvotes: 0
Reputation: 786091
You can use awk
as an alternative and avoid this calculation:
awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file
Upvotes: 0