Stephen Yorke
Stephen Yorke

Reputation: 307

Regex to split on uppercase but not if preceded by space?

In Python is there any way of doing the following? I have a string like "Trip HopDowntempoSynth-pop" and I am able to split on the uppercase character, but what I want is to split on uppercase unless preceded by a space.

I tried adding a ! to:

print (re.findall(r'[A-Z](?:A-Z*(?![a-z])|[a-z]*)',line))

with no difference in where I placed the addition.

Upvotes: 1

Views: 1674

Answers (3)

Philip Tzou
Philip Tzou

Reputation: 6438

Do you mean something like this?

re.split('\s+(?=[A-Z])', "Trip HopDowntempoSynth-pop")
# ['Trip', 'HopDowntempoSynth-pop']

Or the opposite:

pattern = re.compile('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*')

pattern.findall("Trip HopDowntempoSynth-pop")
# ['Trip Hop', 'Downtempo', 'Synth-pop']

pattern.findall("Trip Hop HHopDowntempoSynth-pop")
#['Trip Hop H', 'Hop', 'Downtempo', 'Synth-pop']

Upvotes: 5

Nik Roby
Nik Roby

Reputation: 154

This isnt a regex, but it is easy enough to fit your problem.

s = "Trip HopDowntempoSynth-pop"

arr = []
word = s[0]
for i in range(1, len(s)):
    if s[i].isupper():
        if s[i - 1] == " ":
            word += s[i]
        else:
            arr.append(word)
            word = s[i]
    else:
        word += s[i]
arr.append(word)

print(arr)

It prints out an array that looks like this:

['Trip Hop', 'Downtempo', 'Synth-pop']

Upvotes: 0

caw5cv
caw5cv

Reputation: 721

Potentially roundabout, but achieved what I think you're looking for with a combination of iterating through re.findall and using re.sub to replace matches with a placeholder character that is then split...

import re
s = "Trip HopDowntempoSynth-pop"

pattern = re.compile("[a-z][A-Z]")

matches = re.findall(pattern, s)

for match in matches:
    match_replacer = match[0] + '|' + match[1]
    s = s.replace(match, match_replacer)


s.split('|')

which gives the output

['Trip Hop', 'Downtempo', 'Synth-pop']

Upvotes: 1

Related Questions