Roy
Roy

Reputation: 743

using sed for extracting multiple matches

I have the following line:

echo AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine|sed 's/.*\(ZZ:Z:.*[ ]\).*/\1/g'

which outputs:

ZZ:Z:cas.sup

I'd like to use sed for extracting both ZZ:Z entries from the given line, such as (please avoid awk since the position of ZZ:Z entries may differ per each line in my file):

preferable output:

ZZ:Z:mus.sup  ZZ:Z:cas.sup 

Or possibly:

ZZ:Z:mus.sup  
ZZ:Z:cas.sup 

Thanks.

Upvotes: 0

Views: 461

Answers (2)

SLePort
SLePort

Reputation: 15461

Try grep with the -o (or --only-matching) flag:

$ grep -o 'ZZ:Z:[^ ]* ' <<< "AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine"
ZZ:Z:mus.sup 
ZZ:Z:cas.sup

Or with sed, based on this @potong answer:

sed 's/ZZ:Z:/\n&/g;s/[^\n]*\n\(ZZ:Z:[^ ]* \)[^\n]*/\1 /g;s/.$//'

If you have only two occurrences of the pattern per line:

sed -n 's/.*\(ZZ:Z[^ ]*\).*\(ZZ:Z[^ ]*\).*/\1 \2/p' <<< "AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine" 

Upvotes: 1

Thomas Baruchel
Thomas Baruchel

Reputation: 7517

You can surely achieve it with sed, but wouldn't a tr and grep solution be more natural (because you seem to actually have different logical records despite the fact they appear on a single line):

echo AS:i:0  UQ:i:0  ZZ:Z:mus.sup  NM:i:0  MD:Z:50  ZZ:Z:cas.sup  CO:Z:endOfLine | tr ' ' '\n' | grep "ZZ:Z"

and if you want all back into a single line, just add | tr '\n' ' ' at the end for converting back \n into spaces.

Of course you could also replace grep with sed in this solution.

Upvotes: 1

Related Questions