Why does Regex finditer only return the first result

Question

My string is a transcript, I want to capture the speaker, specifically their second name (Which needs to only match when fully capitalised) Additionally, I want to match their speech until the next speaker begins, I want to loop this process over a huge text file eventually.

The problem is the match only returns one match object, even though there are two different speakers. Also I have tried online regex tester with the python flavor however, they return very different results (not sure why?).

str = 'Senator BACK\n (Western Australia) (21:15): This evening I had the pleasure (...) Senator         DAY\n (South Australia) (21:34): Well, what a week it h(...) ' 

pattern = re.compile("(:?(Senator|Mr|Dr)\s+([A-Z]{2,})\s*($.+?$)\s+($\d{2}:\d{2}$:)(.*))(?=Senator)")

for match in re.finditer(pattern, str):
    print(match)

I want 2 match objects, both objects having a group for there surname and their speech. It's important to note also I have used Regex debuggers online however the python flavor gives different results to Python on my terminal.

Allan · Accepted Answer

Just replace the regex into:

(:?(Senator|Mr|Dr)\s+([A-Z]{2,})\s*($.+?$)\s+($\d{2}:\d{2}$:)(.*))(?=Senator|$)

demo: https://regex101.com/r/gJDaWM/1/

With your current regex, you are enforcing the condition that each match must be followed by Senator through the positive lookahead.

You might actually have to change the positive lookahead into:

(?=Senator|Mr|Dr|$)

if you want to take into account Mr and Dr on top of Senator.

Why does Regex finditer only return the first result

Answers (1)

Related Questions