Reputation: 11
I have a text file:
Butterfly
[tab][space]Bridge
space-12234
%%%^^%^%^^%
I'm trying to keep only lines that contain dictionary words from the "words" file (/usr/share/dict/words
)
Output would look like this:
Butterfly
[tab][space]Bridge
space-12234
I've tried
words='/usr/share/dict/words'
grep ?? $words $1 > ouputfile
Upvotes: 1
Views: 1569
Reputation: 37414
Here is one for awk. It prints exact matches as-is but partial matches with the (a, actually) longest matching word after it (without any more proper definition on how to handle partial matches):
$ awk '
NR==FNR {
words[tolower($1)]
next
}
{
if(tolower($1) in words)
print
else {
for(i in words)
if(($0~i)&&length(i)>length(best))
best=i
if(best) {
print $0,best
best=""
}
}
}' /usr/share/dict/words file
Output (with your original data):
Butterfly
Bridge
space-12234 space
ldfkalap kala
Upvotes: 0
Reputation: 241928
You can use the -f
option:
-f
FILE,--file=
FILEObtain patterns from FILE, one per line. If this option is used multiple times or is combined with the
-e
(--regexp
) option, search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.
grep -f "$words" "$1" > outputfile
You might be also interested in -w
and -F
:
-w
,--word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if
-x
is also specified.
-F
,--fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
Upvotes: 3