Sociopath
Sociopath

Reputation: 13401

How to fetch names using RegEx for given pattern?

I'm trying to fetch below patterns from the text using RegEx:

John Doe
JOHN DOE
Sam John Watson
Sam John Lilly Watson
SAM JOHN WATSON
SAM JOHN LILLY WATSON

Input Data only contains single line and I need to find above patterns in that.

More about Pattern

What I Tried:

import re
re.findall("[A-Z][A-Za-z]+ [A-Z][A-Za-z]+ [A-Za-z]* [A-Za-z]*", text)

Which will correctly identifies input like:

Sam Peters John Doe
SAM WINCH DAN BROWN

but fails on input with less than 4 words.

Upvotes: 1

Views: 57

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 370739

Your pattern is failing because even with the *s after the last two character sets, the spaces next to those last two character sets are not optional. So (for example) having only 2 words in the string would only match if those two words were followed by two spaces.

I'd suggest that you start with [A-Z][A-Za-z]+ for the first word, then repeat a space followed by a word up to 3 times:

^[A-Z][A-Za-z]+(?: [A-Z][A-Za-z]+){1,3}$

https://regex101.com/r/IvSvAH/1

If there may be words with only one character (like "I" or "A"), then repeat the [A-Za-z] character sets with * instead of +.

Upvotes: 3

Related Questions