Marc Bredt
Marc Bredt

Reputation: 955

accessible longest match (from the beginning) without substring in replacement

i wondered if it is possible using sed to match the longest string (from the beginning) NOT containing a substring making any match accessible laterwards using sed's regex replacement variables \n.

regarding the following snippet

echo "blabla/a/b/dee/per" | sed -r -e 's:([^/a]*):\1:g'

i am trying to print out the longest match containing any sign indicated by * but not including the substring /a in a way the above snippet prints out

blabla

regarding (/a deleted/replaced)

echo "blabla/b/b/dee/per" | sed -r -e 's:([^/a]*):\1:g'

i am expecting

blabla/b/b/dee/per

as output due to substring /a is not available so and the longest match leads up to the strings end. i am stuck at describing the substring /a.

CAUTION: [^/a] is just a placeholder to describe the problem. it needs imo to be replaced with a correct substring description. is that possible in some way using sed?

thank you in advance

EDIT: John1024's third answer completes this question. the following snippet is now used:

 sed -r -e 's:(/a|$):\x00:;s:^(.*)\x00(.*):\1:g'

EDIT: to fulfill my original task to prepend values to pathes with different prefixes containing a substring surrounded by other characters i finally came along with

 $ echo -ne "blabla/a/b/dee/per\nblabla/b/dee/per" | \
   sed -r -e 's:(.*)/a/b:\1\x00:;s:(.*)/b:\1\x01:;s:^(.*)\x00(.*):\1/foo/a/b\2:g;s:^(.*)\x01(.*):\1/foo/b\2:g'
 blabla/foo/a/b/dee/per
 blabla/foo/b/dee/per

which first replaces prefix pathes /a/b or /b with \x00 or \x01 respectively making the sed groups, a.k.a. prefix and suffix pathes, accessible through \n like described below.

NOTE: additional trick used here to avoid (.*)/b matching (.*)/a/b too is to replace longest path prefixes first. thanks again @John1024

Upvotes: 1

Views: 1639

Answers (1)

John1024
John1024

Reputation: 113864

Find string from beginning until first occurrence of /a (2nd version of question)

$ echo "blabla/a/b/dee/per" | sed 's|/a.*||'
blabla

$ echo "blabla/b/b/dee/per" | sed 's|/a.*||'
blabla/b/b/dee/per

Find longest string not containing /a (Original Question)

This problem is a more natural match to awk:

$ echo "blabla/a/b/dee/per" | awk -v RS='/a' 'length($0)>max{longest=$0; max=length(longest);} END{print longest;}'
/b/dee/per

$ echo "blabla/b/b/dee/per" | awk -v RS='/a' 'length($0)>max{longest=$0; max=length(longest);} END{print longest;}'
blabla/b/b/dee/per

How it works

  • -v RS='/a'

    This sets the record separator to /a. This divides the input upon every occurrence of /a.

  • length($0)>max{longest=$0; max=length(longest);}

    If the current record, $0, is longer than the previous longest record, update longest and max with the new record.

  • END{print longest;}

    When we reach the end of the input, print out the longest record that we saw.

Capture string from beginning up to the first /a in a sed group (3rd version of question)

$ echo "blabla/a/b/dee/per" | sed -r 's!(/a|$)!\x00!; s|^(.*)\x00.*|I found "\1".|'
I found "blabla".

$ echo "blabla/b/b/dee/per" | sed -r 's!(/a|$)!\x00!; s|^(.*)\x00.*|I found "\1".|'
I found "blabla/b/b/dee/per".

How it works:

  • s!(/a|$)!\x00!

    This replaces the first occurrence of /a with the NUL character, \x00. If no occurrence of /a is found, then the NUL character is placed at the end of the string (signified in a regex by $). (The NUL character was chosen because it can never be held in a bash variable and, thus, is extremely unlikely to be in an input string.)

  • s|^(.*)\x00.*|I found "\1".|

    This saves group 1 all the characters up to the location where the first /a used to be. We can use \1 in the replacement as we please.

As written, this requires a sed, such as GNU sed, which supports the NUL-character, hex 00. If your sed does not support NUL, then replace the \x00 with some character that won't be in your input string but that your sed does support. \x01 might be a good second choice.

Upvotes: 2

Related Questions