smitelli
smitelli

Reputation: 7575

Match specific length words, anchored, without doing magic math

Let's say I wanted to find all 12-letter words in /usr/share/dict/words that started with c and ended with er. Off the top of my head, a workable pattern could look something like:

grep -E '^c.{9}er$' /usr/share/dict/words

It finds:

cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...

But that .{9} bothers me. It feels too magical, subtracting the total length of all the anchor characters from the number defined in the original constraint.

Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12 to be used directly in the pattern?

Upvotes: 3

Views: 97

Answers (4)

hwnd
hwnd

Reputation: 70750

You can use the -x option which selects only matches that exactly match the whole line.

grep -xE '.{12}' | grep 'c.*er'

Ideone Demo

Or use the -P option which clarifies the pattern as a Perl regular expression and use a lookahead assertion.

grep -P '^(?=.{12}$)c.*er$'

Ideone Demo

Upvotes: 2

Chris Seymour
Chris Seymour

Reputation: 85883

One approach with GNU sed:

$ sed -nr '/^.{12}$/{/^c.*er$/p}' words

With BSD sed (Mac OS) it would be:

$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words

Upvotes: 0

Julian
Julian

Reputation: 757

I don't know grep so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. If you can figure out any means to make those available for you, you could write:

^(?=c).{12}(?<=er)$

Maybe as a perl one-liner like this?

cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"

Upvotes: 0

anubhava
anubhava

Reputation: 786091

You can use awk as an alternative and avoid this calculation:

awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file

Upvotes: 0

Related Questions