Regex for matching only capitalized words stuck together (i.e. not separated by whitespace)

Question

I have a long list of strings which are all random words, all of them capitalized, such as 'Pomegranate' and 'Yellow Banana'. However, some of them are stuck together, like so: 'AppleOrange'. There are no special characters or digits.

What I need is a regular expression on Python that matches 'Apple' and 'Orange' separately, but not 'Pomegranate' or 'Yellow'.

As expected, I'm very new to this, and I've only managed to write r"(?... But that still matches 'Yellow' and 'Pomegranate' . How do I do this?

The fourth bird · Accepted Answer

If they all start with an uppercase char and optional lowercase chars, you can make use of lookarounds and an alternation to match both variations

(?<=[a-z])[A-Z][a-z]*|[A-Z][a-z]*(?=[A-Z])

The pattern matches:

(?<=[a-z]) Assert a-z to the left
[A-Z][a-z]* match A-Z and optional chars a-z
| or
[A-Z][a-z]* match A-Z and optional chars a-z
(?=[A-Z]) Assert A-Z to the right

Regex demo

Example

import re

pattern = r"(?<=[a-z])[A-Z][a-z]*|[A-Z][a-z]*(?=[A-Z])"
s = ("AppleOrange
Pomegranate Yellow Banana")

print(re.findall(pattern, s))

Output

['Apple', 'Orange']

Another option could be getting out of the way what you don't want by matching it, and use a capture group for what you want to keep and remove the empty entries from the result:

(?


Regex demo | Python demo
import re

pattern = r"(?

Regex for matching only capitalized words stuck together (i.e. not separated by whitespace)

Answers (2)

Related Questions