Loki
Loki

Reputation: 41

Split string with commas while keeping numeric parts

I'm using the following function to separate strings with commas right on the capitals, as long as it is not preceded by a blank space.

def func(x):

y = re.findall('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)

return ','.join(y)

However, when I try to separate the next string it removes the part with numbers.

Input = '49ersRiders Mapple'

Output = 'Riders Mapple'

I tried the following code but now it removes the 'ers' part.

def test(x):

y = re.findall(r'\d+[A-Z]*|[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)

return ','.join(y)

Output = '49,Riders Mapple'

The output I'm looking for is this:

'49ers,Riders Mapple'

Is it possible to add this indication to my regex?

Thanks in advance

Upvotes: 1

Views: 68

Answers (2)

Corralien
Corralien

Reputation: 120479

Maybe naive but why don't you use re.sub:

def func(x):
    return re.sub(r'(?<!\s)([A-Z])', r',\1', x)

inp = '49ersRiders Mapple'
out = func(inp)
print(out)

# Output
49ers,Riders Mapple

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522119

Here is a regex re.findall approach:

inp = "49ersRiders"
output = ','.join(re.findall('(?:[A-Z]|[0-9])[^A-Z]+', inp))
print(output)  # 49ers,Riders

The regex pattern used here says to match:

(?:
    [A-Z]  a leading uppercase letter (try to find this first)
    |      OR
    [0-9]  a leading number (fallback for no uppercase)
)
[^A-Z]+    one or more non capital letters following

Upvotes: 1

Related Questions