pu4cu
pu4cu

Reputation: 165

Capture a string or part of a string up until a certain character

I have the following text:

    https://stackoverflow.com | https://google.com | first text to match | 
    https://randomsite.com | https://randomurl2.com | text | https://randomsite.com | 
    https://randomsite.com | https://randomsite.com |

I'm trying to match the first sequence of the string which is not a url, up until |. In this example I would like the regex to match:

    https://stackoverflow.com | https://google.com | first text to match |

Currently I have this:

/^(.*)[|]\s(\b\w*\b)?\s[|]/gm

However, this only works if the first sequence which is not a url is only a string without spaces. If first text to match was just first, then it would match.

The desired result would be to match both cases, with strings without spaces and match strings with spaces.

EDIT: Sometimes I would also need a greedy match, where the regex would match everything up until text |.

Upvotes: 1

Views: 149

Answers (2)

Jay
Jay

Reputation: 3950

You want to include spaces

/^(.*)[|]\s(\b(\w|\s)*\b)?\s[|]/gm

If you want to allow all sorts of special characters in the text (including new lines), you can try this approach:

\|\s*((?!\s*\w+:\/\/)[^|]+?)\s\|

https://regex101.com/r/2OOKky/1

If you want to allow all sorts of special characters in the text (but no new lines), you can try this approach:

(?:^|\|)(?:(?!$)\s)+((?!\s*\w+:\/\/)(?:(?!$)[^|])+?)(?:(?!$)\s)*\|

https://regex101.com/r/HS3bra/1

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163362

If you have to match at least a leading url:

\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)+[^\s|][^|\r\n]*\|

Explanation

  • \A Start of string
  • [\s\S]*? Match any character as least as possible
  • \b\K A word boundary, then forget what is matched so far
  • (?:https?://\S*\h*\|\h*)+ Match one or more urls followed by | between optional spaces
  • [^\s|] Match a non whitespace char except for a pipe
  • [^|\r\n]* Optionally match any char except a pipe or a newline, then match the last pipe

Regex demo

If no leading urls is also ok:

\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)*[^\s|][^|\r\n]*\|

Regex demo

Example

$re = '~\A[\s\S]*?\b\K(?:https?://\S*\h*\|\h*)+[^\s|][^|\r\n]*\|~';
$str = '    https://stackoverflow.com | https://google.com | first text to match | 
    https://randomsite.com | https://randomurl2.com | text | https://randomsite.com | 
    https://randomsite.com | https://randomsite.com |';

if(preg_match($re, $str, $matches)) {
    echo $matches[0];
}

Output

https://stackoverflow.com | https://google.com | first text to match |

Upvotes: 2

Related Questions