Reputation: 171
MacOS, Unix
I'm trying to use grep to isolate a particular ID within a line as follows:
# STOCKHOLM 1.0
#=GS WP_002089484.1/1-154 DE [subseq from] MULTISPECIES: AAC(3)-I family aminoglycoside 3-N-acetyltransferase [Proteobacteria]
WP_002089484.1/1-154 MGIIRTCRLGPDQVKSMRAALDLFGREFGDVATYSQHQPDSDYLGNLLRSKTFIALAAFDQEAVVGALAAYVLPKFEQARSEIYIYDLAVSGEHRRQGIATALINLLKHEANALGAYVIYVQADYGDDPAVALYTKLGIREEVMHFDIDPSTAT
#=GR WP_002089484.1/1-154 PP 9*******************************************************************************************************************************************************98
#=GC PP_cons 9*******************************************************************************************************************************************************98
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//
I just want to isolate the part that says WP_002089484.1, but I have to do this for many files, where the ID always starts with "WP_" and always ends with ".1". And I just want unique occurrences from each file.
I tried something like:
grep -o "WP_.\{0,11\}" *.sto >> ProtID
but ProtID still has all the information from the original file.
Upvotes: 1
Views: 149
Reputation: 1523
If you want just unique occurrences from each file, the following command should help you:
grep -o "WP_.\{0,11\}" *.sto | sort | uniq
The output will be:
file1.sto:WP_002089484.1
file2.sto:WP_002089484.1
And if you want to remove the file name from the result:
grep -o "WP_.\{0,11\}" *.sto | sort | uniq | grep -o "WP_.\{0,11\}"
In this case, the output will be:
WP_002089484.1
WP_002089484.1
Upvotes: 1