regex_hm
regex_hm

Reputation: 19

Regex subbing in Python leads to ASCII characters appearing

I am trying to use regex to replace some issues in some text.

Strings look like this:

a = "Here is a shortString with various issuesWith spacing"

My regex looks like this right now: new_string = re.sub("[a-z][A-Z]", "\1 \2", a).

This takes those places with missing spaces (there is always a capital letter after a lowercase letter), and adds a space.

Unfortunately, the output looks like this:

Here is a shor\x01 \x02tring with various issue\x01 \x02ith spacing

I want it to look like this:

b = "Here is a short String with various issues With spacing"

It seems that the regex is properly matching the correct instances of things I want to change, but there is something wrong with my substitution. I thought \1 \2 meant replace with the first part of the regex, add a space, and then add the second matched item. But for some reason I get something else?

Upvotes: 1

Views: 62

Answers (3)

jcxu
jcxu

Reputation: 86

You can have a try like this:

>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

You need to define capturing groups, and use raw string literals:

import re
a = "Here is a shortString with various issuesWith spacing"
new_string = re.sub(r"([a-z])([A-Z])", r"\1 \2", a)
print(new_string)

See the Python demo.

Note that without the r'' prefix Python interpreted the \1 and \2 as characters rather than as backreferences since the \ was parsed as part of an escape sequence. In raw string literals, \ is parsed as a literal backslash.

Upvotes: 1

warownia1
warownia1

Reputation: 2935

>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub("([a-z])([A-Z])", r"\1 \2", a)
'Here is a short String with various issues With spacing'

capturing group and backslash escaping was missing.

you can go even further:

>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub('([a-z])([A-Z])', r'\1 \2', a).lower().capitalize()
'Here is a short string with various issues with spacing'

Upvotes: 2

Related Questions