lecodesportif
lecodesportif

Reputation: 11069

sed - extract STRING between first occurrence of MATCH1 and next occurrence of MATCH2

Using sed, I would like to extract STRING between the first occurrence of MATCH1 and the next occurrence of MATCH2.

echo "abcd MATCH1 STRING MATCH2 efgh MATCH1 ijk MATCH2 MATCH2 lmnop MATCH1" | sed...

I tried this in various ways, but given that MATCH1 and MATCH2 both may appear several times in a row, it has turned out difficult to extract STRING. Any idea how I can achieve this result?

Upvotes: 5

Views: 12307

Answers (4)

potong
potong

Reputation: 58420

This might work for you:

echo "abcd MATCH1 STRING MATCH2 efgh MATCH1 ijk MATCH2 MATCH2 lmnop MATCH1" | 
sed 's/MATCH1/\n&/;s/[^\n]*\n//;s/\(MATCH2\).*/\1/'
MATCH1 STRING MATCH2

Upvotes: 0

SiegeX
SiegeX

Reputation: 140327

You can do this with two calls to sed by first replacing the white spaces with new lines then piping that output to another instance of sed which deletes everything else.

sed 's/ /\n/g' | sed '1,/MATCH1/d;/MATCH2/,$d'


Edit

If the first line (after substitution) happens to be MATCH1, gnu sed can work around that by using 0,/MATCH1/ instead of 1,/MATCH1/ like so:

sed 's/ /\n/g' | sed '0,/MATCH1/d;/MATCH2/,$d'

Edit2

Optimized version of the single call to sed solution that only needs 3 replacements instead of 4

sed -r 's/MATCH1/&\n/;s/MATCH2/\n&/;s/^.*\n(.*)\n.*$/\1/'

Upvotes: 2

Dennis Williamson
Dennis Williamson

Reputation: 360095

These only return the string between the matches and work even if MATCH1 == MATCH2.

echo ... | grep -Po '^.*?\K(?<=MATCH1).*?(?=MATCH2)'

Here's a sed solution:

echo ... | sed  's/MATCH1/&\n/;s/.*\n//;s/MATCH2/\n&/;s/\n.*//'

The advantage of these compared to some of the other solutions is that each one consists of only one call to a single utility.

Upvotes: 4

Brent Newey
Brent Newey

Reputation: 4509

You can do it with perl using non-greedy regex matches:

echo "abcd MATCH1 STRING MATCH2 efgh MATCH1 ijk MATCH2 MATCH2 lmnop MATCH1" | perl -pe 's|^.*?MATCH1(.*?)MATCH2.*$|\1|'

sed does not support these.

EDIT: Here is a solution that combines Dennis' solution with sed:

echo "abcd MATCH1 STRING MATCH2 efgh MATCH1 ijk MATCH2 MATCH2 lmnop MATCH1" | grep -Po '^.*?MATCH1.*?MATCH2' | sed 's/^.*MATCH1\(.*\)MATCH2$/\1/'

Upvotes: 3

Related Questions