Reputation: 41
I'm using the following function to separate strings with commas right on the capitals, as long as it is not preceded by a blank space.
def func(x):
y = re.findall('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
However, when I try to separate the next string it removes the part with numbers.
Input = '49ersRiders Mapple'
Output = 'Riders Mapple'
I tried the following code but now it removes the 'ers' part.
def test(x):
y = re.findall(r'\d+[A-Z]*|[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
Output = '49,Riders Mapple'
The output I'm looking for is this:
'49ers,Riders Mapple'
Is it possible to add this indication to my regex?
Thanks in advance
Upvotes: 1
Views: 68
Reputation: 120479
Maybe naive but why don't you use re.sub
:
def func(x):
return re.sub(r'(?<!\s)([A-Z])', r',\1', x)
inp = '49ersRiders Mapple'
out = func(inp)
print(out)
# Output
49ers,Riders Mapple
Upvotes: 1
Reputation: 522119
Here is a regex re.findall
approach:
inp = "49ersRiders"
output = ','.join(re.findall('(?:[A-Z]|[0-9])[^A-Z]+', inp))
print(output) # 49ers,Riders
The regex pattern used here says to match:
(?:
[A-Z] a leading uppercase letter (try to find this first)
| OR
[0-9] a leading number (fallback for no uppercase)
)
[^A-Z]+ one or more non capital letters following
Upvotes: 1