Whit3H0rse
Whit3H0rse

Reputation: 568

Regex: match string with previous match

I'm trying hard to resolve this regex puzzle but this is above my expertise...

I have strings like this:

String 1:

Interface123|HostVienna ~ Tunnel22 ~ CustomerA ~ ServiceA  ~ Vienna, Street 10|HostAmsterdam ~ Interface123 ~ CustomerB ~ ServiceA  ~ Amsterdam, Street 40|HostSarajevo ~ Interface12 ~ CustomerC ~ ServiceA ~ Sarajevo, Street 1040

String 2

Interface123|HostAmsterdam ~ Interface123 ~ CustomerB ~ ServiceA  ~ Amsterdam,Street 40

I'm trying to make one regex that can match everything from the beginning of the string until "|" (word) and using that match I'm trying to find the string delimited by "|" that contains that word. In my examples, th word is Interface123.

From either example above, the result should be:

HostAmsterdam ~ Interface123 ~ CustomerB ~ ServiceA  ~ Amsterdam,Street 40

Is this possible with pure regex?

Upvotes: 0

Views: 467

Answers (3)

mickmackusa
mickmackusa

Reputation: 47864

/^([^|]+)\|(?:[^|]+\|)*?\K[^|]*\b\1\b[^|]*/

It is important to capture the needle and at minimum use wordboundaries when searching for its next occurrence.

Also, it is important to enable a match as the first, middle or last data set in the string. This is the importance of (?:[^|]+\|)*?

Demo: https://regex101.com/r/7lMwZf/5

Breakdown:

^                     # start of string
([^|]+)\|             # capture needle then match first delimiting pipe
(?:[^|]+\|)*?         # match zero or more "sets of data"
\K[^|]*\b\1\b[^|]*    # forget previously matched characters with \K, then match set of data containing the needle until the end of string or first encountered pipe

Upvotes: 1

Mikhail Vladimirov
Mikhail Vladimirov

Reputation: 13890

This is possible with regexp back references, though not every implementation supports them. Something like:

^([^|]+)\|(?:[^|]*\|)*?([^|]*\1[^|]*)

Second group will capture what you need.

Explanation: ^([^|]+)\| captures your keyword, (?:[^|]*\|)*? matches zero or more strings surrounded by '|' not having keyword, ([^|]*\1[^|]*) matches what you finally need.

Demo

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520928

Here is a pattern which seems to be working:

(?<=\||^)((?:(?!\|).)*Interface123.*?)(?=\||$)

This uses tempered dots to match the segment of the string you want, containing Interface123. Here is a brief explanation:

(?<=\||^)       assert that what is prior is either pipe or the start of the string
((?:(?!\|).)*   consume anything so long is it is NOT pipe
Interface123.*? match 'Interface123` followed by anything up to
(?=\||$)        assert that what follows is either pipe or the end of the string

Demo

This answer uses lookarounds, but based on your comment that your regex flavor is Perl compatible, this should not be a problem.

Upvotes: 0

Related Questions