Reputation: 526
I want to match uppercase character words that are in the middle of a sentence, using Python 3. This is my current regex:
.+?\b([A-Z]+)\b(?=[^.!?][^ ])
So I want to avoid matching words that are followed by this set of characters [^.!?]
and a space. But this expression also matches a word followed by a period and no space. What is my mistake?
I.e, at the moment I get the same result using re.findall()
with and without a space at the end of the searched string:
>>> re.findall(r'.+?\b([A-Z]+)\b(?=[^.!?][^ ])','NO YES YES YES YES NO. ')
['YES', 'YES', 'YES', 'YES']
>>> re.findall(r'.+?\b([A-Z]+)\b(?=[^.!?][^ ])','NO YES YES YES YES NO.')
['YES', 'YES', 'YES', 'YES']
Upvotes: 2
Views: 699
Reputation: 5658
print(re.findall(r'[^A-Z](.+)[^A-Z]\S+\s*$','NO YES YES YES YES NO. '))
['YES YES YES YES']
print(re.findall(r'[^A-Z](.+)[^A-Z]\S+\s*$','NO YES YES YES YES NO.'))
['YES YES YES YES']
Upvotes: 0
Reputation: 784998
Try this regex with negative lookahead:
r'(?!^)\b([A-Z]+)\b(?![.!?] )'
(?!^)
will skip the word at start of sentence.
(?![.!?] )
will fail the match when words are followed by one of those chars followed by a space.
Examples:
>>> re.findall(r'(?!^)\b([A-Z]+)\b(?![.!?] )','NO YES YES YES YES NO.')
['YES', 'YES', 'YES', 'YES', 'NO']
>>> re.findall(r'(?!^)\b([A-Z]+)\b(?![.!?] )','NO YES YES YES YES NO. ')
['YES', 'YES', 'YES', 'YES']
Upvotes: 1