Reputation: 83
I am trying to use a list that looks like this:
List file:
1mAF
2mAF
4mAF
7mAF
9mAF
10mAF
11mAF
13mAF
18mAF
27mAF
33mAF
36mAF
37mAF
38mAF
39mAF
40mAF
41mAF
45mAF
46mAF
47mAF
49mAF
57mAF
58mAF
60mAF
61mAF
62mAF
63mAF
64mAF
67mAF
82mAF
86mAF
87mAF
95mAF
96mAF
to grab out lines that contain a word-level match in a tab delimited file that looks like this:
File_of_interest:
11mAF-NODE_111-g7687-JEFF-tig00000037_arrow-g7396-AFLA_058530 11mAF cluster63
17mAF-NODE_343-g9350 17mAF cluster07
18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530 18mAF cluster20
22mAF-NODE_36-g3735 22mAF cluster28
36mAF-NODE_107-g7427 36mAF cluster77
45mAF-NODE_151-g9067 45mAF cluster14
47mAF-NODE_30-g3242-JEFF-tig00000037_arrow-g7396-AFLA_058530 47mAF cluster21
67mAF-NODE_54-g4372 67mAF cluster06
69mAF-NODE_27-g2754 69mAF cluster39
71mAF-NODE_44-g4178 71mAF cluster25
73mAF-NODE_47-g4895 73mAF cluster57
78mAF-NODE_4-g688 78mAF cluster53
but when I do grep -w -f list file_of_interest
these are the only ones I get:
18mAF-NODE_34-g3647-JEFF-tig00000037_arrow-g7396-AFLA_058530 18mAF cluster20
36mAF-NODE_107-g7427 36mAF cluster77
45mAF-NODE_151-g9067 45mAF cluster14
and this misses a bunch of the values that are in the original list for example note that "67mAF" is in the list and in the file but it isn't returned.
I have tried removing everything after "mAF" in the list and trying again -- no change. I have rewritten the list in a completely new file to no avail. Oddly, I get more of them if I "sort" the list into a new file and then do the grep, but I still don't get all of them. I have also removed all invisible characters using sed (sed $'s/[^[:print:]\t]//g'
). no change.
I am on OSX and both files were created on OSX, but normally grep -f -w
works in the fashion i'm describing above.
I am completely flummoxed. Is I thought grep -w -f
would look for all word-level matches of items in the file in the target file... am I wrong?
Thanks!
Upvotes: 0
Views: 84
Reputation:
My guess is at least one of these files originates from a Windows machine and has CRLF line endings. file(1)
might be used to tell you. If that is the case do:
fromdos FILE
or, alternatively:
dos2unix FILE
Upvotes: 1