Quinox
Quinox

Reputation: 97

sed, awk, grep matching word 2 files

Hope somebody can help.

I have two files. file-a looks like

    bank
    sofa
    table

file-b is a "script". for the example it looks like:

    abcdfg bank
    kitchen abcdfg
    uhuh sofa :=

I need to know only the words in file-a that does not match any words in file-b and print this to file-c

I know have to do this in one single file, but not how to compare this to another file.

I appreciate your help.

Upvotes: 2

Views: 552

Answers (5)

William Pursell
William Pursell

Reputation: 212248

This won't win code golf, but it makes only one pass on the data and doesn't waste any cpu time sorting:

awk '{ for( i=1; i<=NF; i++ ) if( NR==FNR ) w[$i]=1; else delete w[$i] }
     END{ for( i in w ) print i}' file-a file-b > file-c

Note that the speedup is substantial. With both file-a and file-b as /usr/share/dict/words, this awk solution ran on my system in 1.578s. Time for John Lawrence's fgrep solution: 9.157s. Time for Zsolt's fgrep | uniq: 4.951.

Upvotes: 1

John Lawrence
John Lawrence

Reputation: 2923

fgrep -of file-a file-b | fgrep -vf - file-a

Looks first for all the words in file-a that are in file-b and then uses fgrep again to get the words that aren't in that list from file-a.

Upvotes: 1

ebersphi
ebersphi

Reputation: 1

> fileC; cat fileA | while read ZWORD ; do fgrep -q "$ZWORD" fileB || echo $ZWORD >>fileC; done
$ cat fileC
table

Clues:

  • > fileC creates an empty file
  • read reads a line of fileA and puts it into variable ZWORD
  • fgrep do not evaluate $ZWORD as regular expression
  • -q is quiet
  • || execute when preceding command fails

Upvotes: 0

Dennis Williamson
Dennis Williamson

Reputation: 360095

join -1 1 -2 2 -v 1 <(sort file-a) <(sort -k2,2 file-b) > file-c

Upvotes: 0

Zsolt Botykai
Zsolt Botykai

Reputation: 51603

In two step:

fgrep -f file-a -o file-b > this_words_from_file-a_are_in_file-b
sort file-a this_words_from_file-a_are_in_file-b | uniq -u 

(The first searches for the words then outputs only the found ones, then with sort and uniq filtering out those.)

Upvotes: 1

Related Questions