Seito
Seito

Reputation: 11

grep /usr/share/dict/words

I have a text file:

Butterfly

[tab][space]Bridge

space-12234

%%%^^%^%^^%

I'm trying to keep only lines that contain dictionary words from the "words" file (/usr/share/dict/words)

Output would look like this:

Butterfly

[tab][space]Bridge

space-12234

I've tried

words='/usr/share/dict/words'
grep ??  $words $1 > ouputfile

Upvotes: 1

Views: 1569

Answers (2)

James Brown
James Brown

Reputation: 37414

Here is one for awk. It prints exact matches as-is but partial matches with the (a, actually) longest matching word after it (without any more proper definition on how to handle partial matches):

$ awk '
NR==FNR {
    words[tolower($1)]
    next
}
{
    if(tolower($1) in words)
        print
    else {
        for(i in words)
            if(($0~i)&&length(i)>length(best))
                best=i
        if(best) {
            print $0,best
            best=""
        }
    }
}' /usr/share/dict/words file

Output (with your original data):

Butterfly
         Bridge
space-12234 space
ldfkalap kala

Upvotes: 0

choroba
choroba

Reputation: 241928

You can use the -f option:

-f FILE, --file=FILE

Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (--regexp) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.

grep -f "$words" "$1" > outputfile

You might be also interested in -w and -F:

-w, --word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.

-F, --fixed-strings

Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.

Upvotes: 3

Related Questions