D.Parker
D.Parker

Reputation: 171

How to use grep to retrieve a string within a line

MacOS, Unix

I'm trying to use grep to isolate a particular ID within a line as follows:

# STOCKHOLM 1.0

#=GS WP_002089484.1/1-154 DE [subseq from] MULTISPECIES: AAC(3)-I family aminoglycoside 3-N-acetyltransferase [Proteobacteria]

WP_002089484.1/1-154 MGIIRTCRLGPDQVKSMRAALDLFGREFGDVATYSQHQPDSDYLGNLLRSKTFIALAAFDQEAVVGALAAYVLPKFEQARSEIYIYDLAVSGEHRRQGIATALINLLKHEANALGAYVIYVQADYGDDPAVALYTKLGIREEVMHFDIDPSTAT
#=GR WP_002089484.1/1-154 PP 9*******************************************************************************************************************************************************98
#=GC PP_cons                 9*******************************************************************************************************************************************************98
#=GC RF                      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//

I just want to isolate the part that says WP_002089484.1, but I have to do this for many files, where the ID always starts with "WP_" and always ends with ".1". And I just want unique occurrences from each file.

I tried something like:

grep -o "WP_.\{0,11\}" *.sto >> ProtID

but ProtID still has all the information from the original file.

Upvotes: 1

Views: 149

Answers (1)

Thiago Procaci
Thiago Procaci

Reputation: 1523

If you want just unique occurrences from each file, the following command should help you:

grep  -o  "WP_.\{0,11\}" *.sto | sort | uniq

The output will be:

file1.sto:WP_002089484.1
file2.sto:WP_002089484.1

And if you want to remove the file name from the result:

grep  -o  "WP_.\{0,11\}" *.sto | sort | uniq  | grep -o "WP_.\{0,11\}"

In this case, the output will be:

WP_002089484.1
WP_002089484.1

Upvotes: 1

Related Questions