Reputation: 68
I am trying to find a regex in order to match with one pattern and then capture all the year dates following this pattern.
For example, I have the following strings and I am trying to get one regex expression to capture pattern 1 and then the following dates (max 2) if they exist.
"'pattern1' foobar 4 foo 1 bar foo 1900 and 2000"
"'pattern1' foobar 4 foo 1 bar foo 1900"
"'pattern1' foobar 4 foo 1 bar foo"
The following expression matches the first case but not if a date is removed:
('pattern1').*?(\d{4}).*?(\d{4})
Adding ? after the potential date groups only matches the pattern as it satisfies the expression with no match of dates:
('pattern1').*?(\d{4}).*?(\d{4})
Hence my issue is not being able to specify that a group can or can not be in the expression but match if it is
Upvotes: 0
Views: 46
Reputation: 163447
You could make the both parts optional and use word boundaries around the digits to prevent them being part of a larger word.
If you want to match more years, you would have to add more optional groups. In that case, I would suggest using an approach like in the answer of @Alexander Mashin.
('pattern1')(?:.*?(\b\d{4}\b)(?:.*?(\b\d{4}\b))?)?
Upvotes: 2
Reputation: 4637
If you must solve your problem with one regular expression, simply use ^'pattern1'|\d{4}
. The first match will contain 'pattern1'
in the beginning of the string, remaining ones, the years (post AD 999).
A more correct solution would be to match a line, containing 'pattern1'
and dates, capturing 'pattern1'
and the tail containing dates (e.g. ^(?<head>'pattern1')(?<tail>(?:.*?\d{4})+.*$)
, and then matching dates in the tail (just \d{4}
). But the exact code depends on your environment.
Upvotes: 1