T.BJ
T.BJ

Reputation: 99

Regex: Match up to, but not including word, and still match if word does not exist

I have files in the following format:

123_example-1234567_any-characters_here!-ignore.ext

And I want to capture the four groups:

  1. 123_example
  2. 1234567
  3. any-characters_here!
  4. .ext

Which I can do just fine with something like

^(\d{3}_[^\-]+)-(\d+)_(.+)-ignore(\.ext)$

However, sometimes these files do not have the -ignore string (assume this string can only ever be -ignore). For example:

123_example-1234567_any-characters_here!.ext

How can I modify my regex so that it matches both strings and returns the same groups?

My attempt on https://regex101.com/r/pOnEIe/1 where I thought a capture group inside a non capture group might have been the answer.

Upvotes: 0

Views: 130

Answers (1)

The fourth bird
The fourth bird

Reputation: 163237

The capture group at the end should contain a non greedy quantifier, and following that should be the optional group for -ignore

Note that this part [^-]+ might als match newlines.

^(\d{3}_[^-]+)-(\d+)_(.+?)(?:-ignore)?(\.ext)$

Explanation

  • ^ Start of string
  • (\d{3}_[^-]+) Capture 3 digits, match _ and 1+ chars other than -
  • -(\d+)_ Match -, capture 1+ digits and match _
  • (.+?) Capture 1+ chars, as few as possible
  • (?:-ignore)? Optionally match -ignore
  • (\.ext) Capure .ext
  • $ End of string

Regex demo

Upvotes: 2

Related Questions