GasparAlbert
GasparAlbert

Reputation: 35

Regex (grep) matching words made of exactly these letters

I'm trying to grep a list of words and match those that have exactly some letters, no matter the order, but does matter the cuantity, for example, given these letters:

{ a, a, r, f, y, h, l }

over the list

hello
far
hala
miss
cam

should return

far
hala

I don't know if this can be done with regexes or must script something, any aproach is welcome.

Upvotes: 0

Views: 595

Answers (3)

Ruud Helderman
Ruud Helderman

Reputation: 11018

Alphabetically sort the characters in each word; then you can use a straightforward regex /^a?a?f?h?l?r?y?$/ (make sure the letters in the regex are in alphabetical order).

This AWK script will filter the words on stdin (one word per line):

awk 'function sort(s,z){l=split(s,a,"");asort(a);while(l)z=a[l--]z;return z;}sort($0)~/^a?a?f?h?l?r?y?$/'

Upvotes: 0

Bohemian
Bohemian

Reputation: 424983

Handle the quantity restrictions using negative look aheads, one for each letter, and word boundaries either end of a simple character class

\b(?!([^a\W]*a){3})(?!([^r\W]*r){2})(?!([^f\W]*f){2})(?!([^y\W]*y){2})(?!([^h\W]*h){2})(?!([^l\W]*)l{2})[arfyhl]+\b

See live demo, including matching words within longer lines.

The use of \W stops the look ahead running off the end of the word.

Upvotes: 1

Ruud Helderman
Ruud Helderman

Reputation: 11018

Same approach as Bohemian, just a bit shorter due to the use of back references:

\b(?!\w*([rfyhl])\w*\1|\w*([a])(?:\w*\2){2})[arfyhl]+\b

Fiddle: http://regex101.com/r/gO6dC4/1

Upvotes: 0

Related Questions