Ascaris
Ascaris

Reputation: 83

grep -w -f is not returning all matches from list

I am trying to use a list that looks like this:

List file:

1mAF
2mAF
4mAF
7mAF
9mAF
10mAF
11mAF
13mAF
18mAF
27mAF
33mAF
36mAF
37mAF
38mAF
39mAF
40mAF
41mAF
45mAF
46mAF
47mAF
49mAF
57mAF
58mAF
60mAF
61mAF
62mAF
63mAF
64mAF
67mAF
82mAF
86mAF
87mAF
95mAF
96mAF

to grab out lines that contain a word-level match in a tab delimited file that looks like this:

File_of_interest:

11mAF-NODE_111-g7687-JEFF-tig00000037_arrow-g7396-AFLA_058530   11mAF   cluster63
17mAF-NODE_343-g9350    17mAF   cluster07
18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530    18mAF   cluster20
22mAF-NODE_36-g3735 22mAF   cluster28
36mAF-NODE_107-g7427    36mAF   cluster77
45mAF-NODE_151-g9067    45mAF   cluster14
47mAF-NODE_30-g3242-JEFF-tig00000037_arrow-g7396-AFLA_058530    47mAF   cluster21
67mAF-NODE_54-g4372 67mAF   cluster06
69mAF-NODE_27-g2754 69mAF   cluster39
71mAF-NODE_44-g4178 71mAF   cluster25
73mAF-NODE_47-g4895 73mAF   cluster57
78mAF-NODE_4-g688   78mAF   cluster53

but when I do grep -w -f list file_of_interest these are the only ones I get:

18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530    18mAF   cluster20
36mAF-NODE_107-g7427    36mAF   cluster77
45mAF-NODE_151-g9067    45mAF   cluster14

and this misses a bunch of the values that are in the original list for example note that "67mAF" is in the list and in the file but it isn't returned.

I have tried removing everything after "mAF" in the list and trying again -- no change. I have rewritten the list in a completely new file to no avail. Oddly, I get more of them if I "sort" the list into a new file and then do the grep, but I still don't get all of them. I have also removed all invisible characters using sed (sed $'s/[^[:print:]\t]//g'). no change.

I am on OSX and both files were created on OSX, but normally grep -f -w works in the fashion i'm describing above.

I am completely flummoxed. Is I thought grep -w -f would look for all word-level matches of items in the file in the target file... am I wrong?

Thanks!

Upvotes: 0

Views: 84

Answers (1)

user2849202
user2849202

Reputation:

My guess is at least one of these files originates from a Windows machine and has CRLF line endings. file(1) might be used to tell you. If that is the case do:

fromdos FILE

or, alternatively:

dos2unix FILE

Upvotes: 1

Related Questions