Reputation: 35
I'm trying to grep a list of words and match those that have exactly some letters, no matter the order, but does matter the cuantity, for example, given these letters:
{ a, a, r, f, y, h, l }
over the list
hello
far
hala
miss
cam
should return
far
hala
I don't know if this can be done with regexes or must script something, any aproach is welcome.
Upvotes: 0
Views: 595
Reputation: 11018
Alphabetically sort the characters in each word; then you can use a straightforward regex /^a?a?f?h?l?r?y?$/
(make sure the letters in the regex are in alphabetical order).
This AWK script will filter the words on stdin (one word per line):
awk 'function sort(s,z){l=split(s,a,"");asort(a);while(l)z=a[l--]z;return z;}sort($0)~/^a?a?f?h?l?r?y?$/'
Upvotes: 0
Reputation: 424983
Handle the quantity restrictions using negative look aheads, one for each letter, and word boundaries either end of a simple character class
\b(?!([^a\W]*a){3})(?!([^r\W]*r){2})(?!([^f\W]*f){2})(?!([^y\W]*y){2})(?!([^h\W]*h){2})(?!([^l\W]*)l{2})[arfyhl]+\b
See live demo, including matching words within longer lines.
The use of \W
stops the look ahead running off the end of the word.
Upvotes: 1
Reputation: 11018
Same approach as Bohemian, just a bit shorter due to the use of back references:
\b(?!\w*([rfyhl])\w*\1|\w*([a])(?:\w*\2){2})[arfyhl]+\b
Fiddle: http://regex101.com/r/gO6dC4/1
Upvotes: 0