Reputation: 307
In Python is there any way of doing the following? I have a string like "Trip HopDowntempoSynth-pop"
and I am able to split on the uppercase character, but what I want is to split on uppercase unless preceded by a space.
I tried adding a !
to:
print (re.findall(r'[A-Z](?:A-Z*(?![a-z])|[a-z]*)',line))
with no difference in where I placed the addition.
Upvotes: 1
Views: 1674
Reputation: 6438
Do you mean something like this?
re.split('\s+(?=[A-Z])', "Trip HopDowntempoSynth-pop")
# ['Trip', 'HopDowntempoSynth-pop']
Or the opposite:
pattern = re.compile('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*')
pattern.findall("Trip HopDowntempoSynth-pop")
# ['Trip Hop', 'Downtempo', 'Synth-pop']
pattern.findall("Trip Hop HHopDowntempoSynth-pop")
#['Trip Hop H', 'Hop', 'Downtempo', 'Synth-pop']
Upvotes: 5
Reputation: 154
This isnt a regex, but it is easy enough to fit your problem.
s = "Trip HopDowntempoSynth-pop"
arr = []
word = s[0]
for i in range(1, len(s)):
if s[i].isupper():
if s[i - 1] == " ":
word += s[i]
else:
arr.append(word)
word = s[i]
else:
word += s[i]
arr.append(word)
print(arr)
It prints out an array that looks like this:
['Trip Hop', 'Downtempo', 'Synth-pop']
Upvotes: 0
Reputation: 721
Potentially roundabout, but achieved what I think you're looking for with a combination of iterating through re.findall and using re.sub to replace matches with a placeholder character that is then split...
import re
s = "Trip HopDowntempoSynth-pop"
pattern = re.compile("[a-z][A-Z]")
matches = re.findall(pattern, s)
for match in matches:
match_replacer = match[0] + '|' + match[1]
s = s.replace(match, match_replacer)
s.split('|')
which gives the output
['Trip Hop', 'Downtempo', 'Synth-pop']
Upvotes: 1