user2132980
user2132980

Reputation: 225

grep to find words with unique letters

how to use grep to find occurrences of words from a dictionary file which have a given set of letters with the restriction that each letter occurs once and only once.

EG if the letters are abc then the expected output is:

cab


EDIT:

Given a dictionary file (that is a file containing one word per line such as /usr/share/dict/words on mac os x operating system) and a set of (unique) characters, I want to print out all of the dictionary file's words that contain each character of the input set once and only once. For example if the set of characters is {a,b,c} then print out all (3-letter) words that contain each character of the set.

I am looking, preferably, for a solution that uses just grep expressions.

Upvotes: 2

Views: 4627

Answers (4)

user2692769
user2692769

Reputation: 11

grep -E ^[abc]{3}.$ <Dictionary file> | grep -v -e a.*a -e b.*b -e c.*c

i.e. Find all three letter strings matching the input and pipe these through inverse grep to remove strings with double letters.

I'm using the '.' after {3} because my dictionary file is windows based so has an extra carriage return or line feed. So, that's probably not necessary.

Upvotes: 1

user2132980
user2132980

Reputation: 225

The solution I found involves using grep first to extract all n-letter words that contain only letters from the input set - although some letters might appear more than once, some may not appear; (again I am assuming that the input letters are unique). Then it does a series of 1-letter greps to make sure each letter occurs at least once. Because the words are of length n this ensures the word contains each letter once and only once. For example, if the input character set is (a,b,c} then the solution would be:

grep -E '^[abc]{3}$' /usr/share/dict/words | grep a | grep b | grep c

a simple bash script can be written which creates this grep string and executes it against the word file, using $1 as the input letter set. It might not be the most efficient method of generating the string, but as I am not familiar with sed or awk it does seem to solve my problem. The script I created is:

#!/bin/sh
slen=${#1}
g2="'^[$1]{$slen}\$'"
g3=""
ix1=0
while [ $ix1 -lt $slen ]
do
  g3="$g3 | grep ${1:$ix1:1}"
  ix1=$((ix1+1))
done
eval grep -E $g2 /usr/share/dict/words $g3

Upvotes: 0

Bohemian
Bohemian

Reputation: 425033

Given a series of letters, for example abc, you can convert each one to a lookahead, like this:

^(?=[^a]*a[^a]*)(?=[^b]*b[^b]*)(?=[^c]*c[^c]*)$

You may need to use the "extended regex" flag -E to use this regex with grep.


To create this regex from a string, you could use sed (an exercise for the reader)

Upvotes: 1

emallove
emallove

Reputation: 1567

Below is a Perl solution. Note, you'll need to add more words to the dictionary, and read input in to the $input variable. An array of valid words will end up in @results.

#!/usr/bin/env perl

use Data::Dumper;

my $input = "abc";

my @dictionary = qw(aaa aac aad aal aam aap aar aas aat aaw aba abc abd abf abg
  abh abm abn abo abr abs abv abw aca acc ace aci ack acl acp acs act acv ada adb
  adc add adf adh adl adn ado adp adq adr ads adt adw aea aeb aec aed aef aes aev
  afb afc afe aff afg afi afk afl afn afp aft afu afv agb agc agl agm agn ago agp
  ...

  PUT A REAL DICTIONARY HERE!

  ...
  zie zif zig zii zij zik zil zim zin zio zip zir zis zit ziu ziv zlm zlo zlx zma
  zme zmi zmu zna zoa zob zoe zog zoi zol zom zon zoo zor zos zot zou zov zoy zrn
  zsr zub zud zug zui zuk zul zum zun zuo zur zus zut zuz zva zwo zye zzz);

# Generate a lookahead expression for each character in the input word
my $regexp = join("", map { "(?=.*$_)" } split(//, $input));

my @results;
foreach my $word (@dictionary) {

  # If the size of the input doesn't match the dictionary word, skip to the
  # next word.
  if (length($input) != length($word)) {
    next;
  }

  if ($word =~ /$regexp/) {
    push(@results, $word);
  }
}

print Dumper @results;

Upvotes: 0

Related Questions