Regex to match words with first capital letter

Question

Trying to identify structure for my text data using a regex and hitting road blocks.

For the sample text below

I AM A HEADER:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.

I AM A TAB- Lorem Ipsum is simply dummy text of the printing

My below regular expression picks up 'I AM A HEADER:' and 'I AM A TAB-'

^\s*(?:\b[A-Z]+\b[\s]*)+(?:[:-])\s*$

Please suggest an edit so as to match 'I Am A Header' and 'I Am A Tab' and also ignore the end-markers ':' and '-'.

Wiktor Stribiżew · Accepted Answer

You can use

^\s*(?:\b[a-zA-Z]+\b\s*)+(?=[:-])

See regex demo

Regex breakdown:

^ - start of string
\s* - 0 or more whitespace
(?:\b[a-zA-Z]+\b\s*)+ - 1 or more sequences of
- \b - word boundary (redundant)
- [a-zA-Z]+ - 1 or more letters
- \b\s* - 0 or more whitespaces.
(?=[:-]) - a lookahead requiring a : or - to be right after the preceding subpattern

The main points here is adding [a-z] to the [A-Z] range, removing \s*$ and turning (?:...) non-capturing group to the look-ahead (that does not consume characters).

Regex to match words with first capital letter

Answers (2)

Related Questions