Capture the n previous words when matching a string

Question

Let's say I have this text:

abcdefg Mark Jones (PP) etc etc
akslaskAS Taylor Daniel Lautner (PMB) blabla
etcetc Allan Stewart Konigsberg Farrow (PRTW)

I want to capture these personal names:

Mark Jones, Taylor Daniel Lautner, Allan Stewart Konigsberg Farrow.

Basically, when we find (P followed by any capital letter, we capture the n previous words that start with a capital letter.

What I have achieved so far is to capture just one previous word with this code: \w+(?=\s+(\(P+[A-Z])). But I couldn't evolve from that. I appreciate it if someone can help :)

Shubham Sharma · Accepted Answer

Regex pattern

\b((?:[A-Z]\w+\s?)+)\s\(P[A-Z]

In order to find all matching occurrences of the above regex pattern we can use re.findall

import re

text = """abcdefg Mark Jones (PP) etc etc
akslaskAS Taylor Daniel Lautner (PMB) blabla
etcetc Allan Stewart Konigsberg Farrow (PRTW)
"""

matches = re.findall(r'\b((?:[A-Z]\w+\s?)+)\s\(P[A-Z]', text)

>>> matches
['Mark Jones', 'Taylor Daniel Lautner', 'Allan Stewart Konigsberg Farrow']

Regex details

\b : Word boundary to prevent partial matches
((?:[A-Z]\w+\s?)+): First Capturing group
- (?:[A-Z]\w+\s?)+: Non capturing group matches one or more times
  - [A-Z]: Matches a single alphabet from capital A to Z
  - \w+: Matches any word character one or more times
  - \s? : Matches any whitespace character zero or one times
\s : Matches a single whitespace character
\(: Matches the character ( literally
P : Matches the character P literally
[A-Z] : Matches a single alphabet from capital A to Z

See the online regex demo

Capture the n previous words when matching a string

Answers (2)

Related Questions