aja228
aja228

Reputation: 68

Matching multiple pattern without knowing if they are in the string

I am trying to find a regex in order to match with one pattern and then capture all the year dates following this pattern.
For example, I have the following strings and I am trying to get one regex expression to capture pattern 1 and then the following dates (max 2) if they exist.

"'pattern1' foobar 4 foo 1 bar foo 1900 and 2000"  
"'pattern1' foobar 4 foo 1 bar foo 1900"  
"'pattern1' foobar 4 foo 1 bar foo"  

The following expression matches the first case but not if a date is removed:

('pattern1').*?(\d{4}).*?(\d{4})

Adding ? after the potential date groups only matches the pattern as it satisfies the expression with no match of dates:

('pattern1').*?(\d{4}).*?(\d{4})

Hence my issue is not being able to specify that a group can or can not be in the expression but match if it is

Upvotes: 0

Views: 46

Answers (2)

The fourth bird
The fourth bird

Reputation: 163447

You could make the both parts optional and use word boundaries around the digits to prevent them being part of a larger word.

If you want to match more years, you would have to add more optional groups. In that case, I would suggest using an approach like in the answer of @Alexander Mashin.

('pattern1')(?:.*?(\b\d{4}\b)(?:.*?(\b\d{4}\b))?)?

Regex demo

Upvotes: 2

Alexander Mashin
Alexander Mashin

Reputation: 4637

If you must solve your problem with one regular expression, simply use ^'pattern1'|\d{4}. The first match will contain 'pattern1' in the beginning of the string, remaining ones, the years (post AD 999).

A more correct solution would be to match a line, containing 'pattern1' and dates, capturing 'pattern1' and the tail containing dates (e.g. ^(?<head>'pattern1')(?<tail>(?:.*?\d{4})+.*$), and then matching dates in the tail (just \d{4}). But the exact code depends on your environment.

Upvotes: 1

Related Questions