Yeo
Yeo

Reputation: 11784

How to match both pattern under one regex?

Raw Data:

# Case 1
1980 (reprint 1987)

# Case 2
1980 (1987 reprint)

Capture Group:

{
    publish: 1980,
    reprint: 1987
}

Requirement:

  1. Match "reprint" and thus treat its year segment as reprint.
  2. First matching year is always the publish year.

Current Approach:

# Match 2nd case but not the 1st case.
(?P<publish>\d{4}).*(?P<reprint>\d{4}(?=\sreprint.*))

# Match 1st case but not the 2nd case
(?P<publish>\d{4}).*(?<=reprint\s)(?P<reprint>\d{4})

I am not sure how to merge the 2 regex above. So I have to iterate the matching twice. Alternatively, if there is an answer on how we can match both under one regex is far better.

Upvotes: 3

Views: 76

Answers (3)

Kasravnd
Kasravnd

Reputation: 107287

All you need is specifying the parenthesis and mixing both regexes:

r'(?P<publish>\d{4})\s\(.*(?P<reprint>\d{4}).*\)

Demo:

>>> [i.groupdict() for i in re.finditer(r'(?P<publish>\d{4})\s\(.*(?P<reprint>\d{4}).*\)', s)]
[{'reprint': '1987', 'publish': '1980'}, {'reprint': '1987', 'publish': '1980'}]

If the existence of reprint within the parenthesis is necessary you can use a positive lookahead in order to enforce it:

>>> s2 = """# Case 1
... 1980 (reprint 1723)
... 
... # Case 2
... 1980 (1987 reprint)"""
>>> 

>>> [i.groupdict() for i in re.finditer(r'(?P<publish>\d{4})\s\(((?=reprint).*)?(?P<reprint>\d{4})((?=\sreprint).*)?\)', s2)]
[{'reprint': '1723', 'publish': '1980'}, {'reprint': '1987', 'publish': '1980'}]

Upvotes: 1

anubhava
anubhava

Reputation: 785058

You can use this single regex with alternation. reprint group will match if it is followed by \sreprint (asserted by a positive lookahead) or if it preceded by reprint\s (asserted by a positive lookbehind).

(?P<publish>\d{4}).*?(?P<reprint>(?:\d{4}(?=\sreprint)|(?<=reprint\s)\d{4}))

RegEx Demo

Upvotes: 3

Krzysztof Krasoń
Krzysztof Krasoń

Reputation: 27476

Maybe just:

(?P<publish>\d{4}).*(?:reprint )?(?P<reprint>\d{4})(?: reprint)?

https://regex101.com/r/lX7hK5/1

This will assume that the reprint can appear before or after the date, but your raw data suggest that it can be only on one place so it will work (e.g. 1980 (reprint 1987 reprint)).

Upvotes: 1

Related Questions