Reputation: 13401
I'm trying to fetch below patterns from the text using RegEx:
John Doe
JOHN DOE
Sam John Watson
Sam John Lilly Watson
SAM JOHN WATSON
SAM JOHN LILLY WATSON
Input Data only contains single line and I need to find above patterns in that.
More about Pattern
What I Tried:
import re
re.findall("[A-Z][A-Za-z]+ [A-Z][A-Za-z]+ [A-Za-z]* [A-Za-z]*", text)
Which will correctly identifies input like:
Sam Peters John Doe
SAM WINCH DAN BROWN
but fails on input with less than 4 words.
Upvotes: 1
Views: 57
Reputation: 370739
Your pattern is failing because even with the *
s after the last two character sets, the spaces next to those last two character sets are not optional. So (for example) having only 2 words in the string would only match if those two words were followed by two spaces.
I'd suggest that you start with [A-Z][A-Za-z]+
for the first word, then repeat a space followed by a word up to 3 times:
^[A-Z][A-Za-z]+(?: [A-Z][A-Za-z]+){1,3}$
https://regex101.com/r/IvSvAH/1
If there may be words with only one character (like "I" or "A"), then repeat the [A-Za-z]
character sets with *
instead of +
.
Upvotes: 3