egidra
egidra

Reputation: 9087

Excluding terms inside parentheses using regex

I only want to capture the words that are capitalized that are not in parentheses:

Reggie (Reginald) Potter -> Reggie Potter

I am using this regex:

test = re.findall('([A-Z][a-z]+(?:\s\(.*?\))?(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)', 'Reggie (Reginald) Potter')

I get this back:

Reggie (Reginald) Potter

I thought since this is non capturing:

(?:\s\(.*?\))

I wouldn't get back anything inside of the parentheses

Upvotes: 0

Views: 253

Answers (2)

Adam Mihalcin
Adam Mihalcin

Reputation: 14458

I would use a simpler regex plus a list comprehension:

all_words = re.findall(r'(\(?\b[A-Z][a-z]+\b\)?)', 'Reggie (Reginald) Potter')
good_matches = [word for word in all_words if len(word) > 0 and not (word[0] == '(' and word[-1] == ')')]

Now good_matches is ['Reggie', 'Potter'], as expected.

Upvotes: 0

Qtax
Qtax

Reputation: 33908

If the words you want to avoid are directly adjacent to parentheses, you could use negative look-behinds and look-aheads to match the ones that are not in parentheses:

(?<!\()\b([A-Z][a-z]+)\b(?!\))

Upvotes: 2

Related Questions