Reputation: 97
Hope somebody can help.
I have two files.
file-a
looks like
bank
sofa
table
file-b
is a "script".
for the example it looks like:
abcdfg bank
kitchen abcdfg
uhuh sofa :=
I need to know only the words in file-a
that does not match any words in file-b
and print this to file-c
I know have to do this in one single file, but not how to compare this to another file.
I appreciate your help.
Upvotes: 2
Views: 552
Reputation: 212248
This won't win code golf, but it makes only one pass on the data and doesn't waste any cpu time sorting:
awk '{ for( i=1; i<=NF; i++ ) if( NR==FNR ) w[$i]=1; else delete w[$i] }
END{ for( i in w ) print i}' file-a file-b > file-c
Note that the speedup is substantial. With both file-a
and file-b
as /usr/share/dict/words
, this awk solution ran on my system in 1.578s. Time for John Lawrence's fgrep solution: 9.157s. Time for Zsolt's fgrep | uniq: 4.951.
Upvotes: 1
Reputation: 2923
fgrep -of file-a file-b | fgrep -vf - file-a
Looks first for all the words in file-a that are in file-b and then uses fgrep again to get the words that aren't in that list from file-a.
Upvotes: 1
Reputation: 1
> fileC; cat fileA | while read ZWORD ; do fgrep -q "$ZWORD" fileB || echo $ZWORD >>fileC; done
$ cat fileC
table
Clues:
> fileC
creates an empty fileread
reads a line of fileA
and puts it into variable ZWORD
fgrep
do not evaluate $ZWORD
as regular expression-q
is quiet||
execute when preceding command failsUpvotes: 0
Reputation: 360095
join -1 1 -2 2 -v 1 <(sort file-a) <(sort -k2,2 file-b) > file-c
Upvotes: 0
Reputation: 51603
In two step:
fgrep -f file-a -o file-b > this_words_from_file-a_are_in_file-b
sort file-a this_words_from_file-a_are_in_file-b | uniq -u
(The first searches for the words then outputs only the found ones, then with sort
and uniq
filtering out those.)
Upvotes: 1