Reputation: 11784
Raw Data:
# Case 1
1980 (reprint 1987)
# Case 2
1980 (1987 reprint)
Capture Group:
{
publish: 1980,
reprint: 1987
}
Requirement:
Current Approach:
# Match 2nd case but not the 1st case.
(?P<publish>\d{4}).*(?P<reprint>\d{4}(?=\sreprint.*))
# Match 1st case but not the 2nd case
(?P<publish>\d{4}).*(?<=reprint\s)(?P<reprint>\d{4})
I am not sure how to merge the 2 regex above. So I have to iterate the matching twice. Alternatively, if there is an answer on how we can match both under one regex is far better.
Upvotes: 3
Views: 76
Reputation: 107287
All you need is specifying the parenthesis and mixing both regexes:
r'(?P<publish>\d{4})\s\(.*(?P<reprint>\d{4}).*\)
Demo:
>>> [i.groupdict() for i in re.finditer(r'(?P<publish>\d{4})\s\(.*(?P<reprint>\d{4}).*\)', s)]
[{'reprint': '1987', 'publish': '1980'}, {'reprint': '1987', 'publish': '1980'}]
If the existence of reprint
within the parenthesis is necessary you can use a positive lookahead in order to enforce it:
>>> s2 = """# Case 1
... 1980 (reprint 1723)
...
... # Case 2
... 1980 (1987 reprint)"""
>>>
>>> [i.groupdict() for i in re.finditer(r'(?P<publish>\d{4})\s\(((?=reprint).*)?(?P<reprint>\d{4})((?=\sreprint).*)?\)', s2)]
[{'reprint': '1723', 'publish': '1980'}, {'reprint': '1987', 'publish': '1980'}]
Upvotes: 1
Reputation: 785058
You can use this single regex with alternation. reprint
group will match if it is followed by \sreprint
(asserted by a positive lookahead) or if it preceded by reprint\s
(asserted by a positive lookbehind).
(?P<publish>\d{4}).*?(?P<reprint>(?:\d{4}(?=\sreprint)|(?<=reprint\s)\d{4}))
Upvotes: 3
Reputation: 27476
Maybe just:
(?P<publish>\d{4}).*(?:reprint )?(?P<reprint>\d{4})(?: reprint)?
https://regex101.com/r/lX7hK5/1
This will assume that the reprint can appear before or after the date, but your raw data suggest that it can be only on one place so it will work (e.g. 1980 (reprint 1987 reprint)
).
Upvotes: 1