olala
olala

Reputation: 4446

How to grep exact matches from a file of a list of strings

I have a file A with one column with a list of strings like this:

ADAMTS9
AIP
....

I want to use the strings in file A to grep the lines that contains them in file B and file B looks like this:

chr13   50571142        50592603        ADAMTS9  21461   +
chr19   50180408        50191707        AIP   11299   +
chr19   50180408        50193000        AIP-S1   6532    -

I have used:

grep -F -i -w -f A B 

and it was able to grep all the 3 lines above. However, I only want the first two lines to be grep'ed and the third line with AIP-S1 isn't an exact match with AIP..

Can someone tell me how to fix the command to do that?

Thanks.

Upvotes: 4

Views: 2498

Answers (2)

glenn jackman
glenn jackman

Reputation: 246807

You are using -w to do whole word searching. The trouble is that in "AIP-S1" the "-" character is not a word character. So "AIP" is found as a whole word.

This crazy command works to transform the patterns file to include "word-boundary-like" patterns:

$ grep -if <(sed 's/^/\\(^\\|[[:space:]]\\)/; s/$/\\($\\|[[:space:]]\\)/' A) B
chr13   50571142        50592603        ADAMTS9  21461   +
chr19   50180408        50191707        AIP   11299   +

Upvotes: 2

anubhava
anubhava

Reputation: 785128

You can use awk instead:

awk 'FNR==NR{a[$1];next} ($4 in a)' A B
chr13   50571142        50592603        ADAMTS9  21461   +
chr19   50180408        50191707        AIP   11299   +

OR to search in any field:

awk 'FNR==NR{a[$1];next} {for (i=1; i<=NF; i++) if ($i in a) print}' A B

Upvotes: 3

Related Questions