gbt
gbt

Reputation: 799

grep a list into a multi columns file and get fully matching lines

not sure how to ask this question but an example would surely clarify. Suppose I have this file:

$ cat intoThat 
a   b
a   h
a   l
a   m
b   c
b   d
b   m
c   b
c   d
c   f
c   g
c   p
d   h
d   f
d   p

and this list:

cat grepThis 
a
b
c
d

now I would like to grepThis intoThat and I would do this:

$grep -wf grepThis intoThat

which will give an output like this:

**a b**
a   h
a   l
a   m
**b c**
**b d**
b   m
**c b**
**c d**
c   f
c   g
c   p
d   h
d   f
d   p

now the asterisks are used to highlight those lines that I would like grep to return. These are the lines that have a full match but...how to tell grep (or awk or whatever) to get only these lines? Of course it is possible that some lines do not match any pattern, e.g. in the intoThat file I may have some other letters like g, h, l, s, t, etc...

Upvotes: 0

Views: 287

Answers (1)

αғsнιη
αғsнιη

Reputation: 2761

With awk, you could do:

awk 'NR==FNR{ seen[$0]++; next } ($1 in seen && $2 in seen)' grepThis intoThat
a   b
b   c
b   d
c   b
c   d
  • NR is set to 1 when the first record read by awk and incrementing for each next records reading either in single or multiple input files until all records/line read.
  • FNR is set to 1 when the first record read by awk and incrementing for each next records reading in current file and reset back to 1 for the next input file if multiple input files.
  • so NR == FNR is always a true condition for first input file and the block followed by this will perform actions on the first file only.

  • The seen is an associated awk array named seen (you can use different name as you want) with the key of whole line $0 and value with occurrences of each line occurred (this way usually is using to remove duplicated records in awk too).

  • The next token skips to executing rest of the commands and those will only execute actually for next file(s) except first.

  • In next (....), we are just checking if both column$1 and $2 are present in the array, if so they will goes in output.

Upvotes: 3

Related Questions