How to find all words with first letter as upper case using Python Regex

Question

I need to find all the words in a file which start with an upper case, I tried the below code but it returns an empty string.

import os
import re

matches = []

filename = 'C://Users/Documents/romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        regex = "^[A-Z]\w*$"
        matches.append(re.findall(regex, line))
print(matches)

File:

Hi, How are You?

Output:

[Hi,How,You]

The fourth bird · Accepted Answer

You can use a word boundary instead of the anchors ^ and $

\b[A-Z]\w*

Regex demo

Note that if you use matches.append, you add an item to the list and re.findall returns a list, which will give you a list of lists.

import re

matches = []
regex = r"\b[A-Z]\w*"
filename = r'C:\Users\Documents\romeo.txt'
with open(filename, 'r') as f:
    for line in f:
        matches += re.findall(regex, line)
print(matches)

Output

['Hi', 'How', 'You']

If there should be a whitespace boundary to the left, you could also use

(?


Regex demo

If you don't want to match words using \w with only uppercase chars, you could use for example a negative lookahead to assert not only uppercase chars till a word boundary
\b[A-Z](?![A-Z]*\b)\w*


\b A word boundary to prevent a partial match
[A-Z] Match an uppercase char A-Z
(?![A-Z]*\b) Negative lookahead, assert not only uppercase chars followed by a word boundary
\w* Match optional word chars

Regex demo

To match a word that starts with an uppercase char, and does not contain any more uppercase chars:
\b[A-Z][^\WA-Z]*\b


\b A word boundary
[A-Z] Match an uppercase char A-Z
[^\WA-Z]* Optionally match a word char without chars A-Z
\b A word boundary

Regex demo

How to find all words with first letter as upper case using Python Regex

Answers (2)

Related Questions