TranquilMarmot
TranquilMarmot

Reputation: 2314

Regex to consume between two groups where the second group is optional

I have the following strings:

Sally: Hello there #line:34de2f
Bob: How are you today?

These strings have three parts to them...

I want to grab the "text" between the "name" and the optional "line identifier" using a regex.

This seems like what negative lookaheads are for:

(?<=:).*?(?!#line:.*)$

But this still captures the "line identifier".

The following works, but I do not want to actually capture the "line identifier":

(?<=:).*?(#line:.*)?$

Upvotes: 1

Views: 661

Answers (3)

JGFMK
JGFMK

Reputation: 8904

^([^:]*)[:]([^#]*)(?!line.*)

This too might work for you:

  • ^ - for start of line
  • ([^:]*.) - not a colon repeating in a capture group for the name
  • [:] - the colon (this could be simplified to just :)
  • [^#] - not a hash symbol (within a capture group and repeating) ([^#]*)
  • (?!line.*) - negative lookahead.

Upvotes: 0

Mark Moretto
Mark Moretto

Reputation: 2348

Another solution (works in Python):

\w+:\s+?(.+)?\s+?#?.*?

Examples:

import re

tst1 = "Sally: Hello there #line:34de2f"
res1 = re.search(r"\w+:\s+?(.+)?\s+?#?.*?", tst1)
res1.groups(1) # ('Hello there',)

tst2 = "Bob: How are you today?"
res2 = re.search(r"\w+:\s+?(.+)?\s+?#?.*?", tst2)
res2.groups(1) # ('How are you',)

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627371

You may try using

(?<=:\s).*?(?=\s*#line:.*|$)

See this regex demo. Details:

  • (?<=:\s) - a location immediately preceded with : and a whitespace
  • .*? - any 0 or more chars other than line break chars, as few as possible
  • (?=\s*#line:.*|$) - a location immediately followed with 0+ whitespaces, #line: string or end of string.

You may also use

:\s*(.*?)(?:\s*#line:.*)?$

See the regex demo. Get the contents in Group 1.

Details

  • :\s* - a colon and then 0 or more whitespaces
  • (.*?) - Capturing group #1: any zero or more chars other than line break chars, as few as possible
  • (?:\s*#line:.*)? - an optional sequence of
    • \s* - 0+ whitespaces
    • #line: - a literal #line: string
    • .* - any zero or more chars other than line break chars, as many as possible
  • $ - end of string.

Upvotes: 1

Related Questions