Reputation: 4446
I have a file A with one column with a list of strings like this:
ADAMTS9
AIP
....
I want to use the strings in file A to grep the lines that contains them in file B and file B looks like this:
chr13 50571142 50592603 ADAMTS9 21461 +
chr19 50180408 50191707 AIP 11299 +
chr19 50180408 50193000 AIP-S1 6532 -
I have used:
grep -F -i -w -f A B
and it was able to grep all the 3 lines above. However, I only want the first two lines to be grep'ed and the third line with AIP-S1 isn't an exact match with AIP..
Can someone tell me how to fix the command to do that?
Thanks.
Upvotes: 4
Views: 2498
Reputation: 246807
You are using -w
to do whole word searching. The trouble is that in "AIP-S1" the "-" character is not a word character. So "AIP" is found as a whole word.
This crazy command works to transform the patterns file to include "word-boundary-like" patterns:
$ grep -if <(sed 's/^/\\(^\\|[[:space:]]\\)/; s/$/\\($\\|[[:space:]]\\)/' A) B
chr13 50571142 50592603 ADAMTS9 21461 +
chr19 50180408 50191707 AIP 11299 +
Upvotes: 2
Reputation: 785128
You can use awk instead:
awk 'FNR==NR{a[$1];next} ($4 in a)' A B
chr13 50571142 50592603 ADAMTS9 21461 +
chr19 50180408 50191707 AIP 11299 +
OR to search in any field:
awk 'FNR==NR{a[$1];next} {for (i=1; i<=NF; i++) if ($i in a) print}' A B
Upvotes: 3