patrick
patrick

Reputation: 11731

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.

  1. for Madonna has a new album I'm looking for madonna
  2. for Paul Young has no new album I'm looking for Paul Young
  3. for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer

I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.

What REGEX can I use to find the first one or two capitalized words in the above examples?

Upvotes: 4

Views: 2838

Answers (3)

Serhii Zelenchuk
Serhii Zelenchuk

Reputation: 345

If u need a Full name only (a two words with the first capitalize letters), this is a simple example:

^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$

Try it. Enjoy!

Upvotes: 0

dotNET
dotNET

Reputation: 35470

This is probably simpler:

^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?

Replace + with * if you expect single-letter words.

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You may use

^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?

See the regex demo

Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.

Details

  • ^ - start of string
  • [A-Z] - an uppercase ASCII letter
  • [-a-zA-Z]* - zero or more ASCII letters / hyphens
  • (?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
    • \s+ - 1+ whitespace
    • [A-Z] - an uppercase ASCII letter
    • [-a-zA-Z]* - zero or more ASCII letters / hyphens

A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):

^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?

where \p{L} matches any letter and \p{Lu} matches any uppercase letter.

Upvotes: 6

Related Questions