Reputation: 19
I am trying to use regex to replace some issues in some text.
Strings look like this:
a = "Here is a shortString with various issuesWith spacing"
My regex looks like this right now:
new_string = re.sub("[a-z][A-Z]", "\1 \2", a)
.
This takes those places with missing spaces (there is always a capital letter after a lowercase letter), and adds a space.
Unfortunately, the output looks like this:
Here is a shor\x01 \x02tring with various issue\x01 \x02ith spacing
I want it to look like this:
b = "Here is a short String with various issues With spacing"
It seems that the regex is properly matching the correct instances of things I want to change, but there is something wrong with my substitution. I thought \1 \2
meant replace with the first part of the regex, add a space, and then add the second matched item. But for some reason I get something else?
Upvotes: 1
Views: 62
Reputation: 86
You can have a try like this:
>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
Upvotes: 0
Reputation: 627101
You need to define capturing groups, and use raw string literals:
import re
a = "Here is a shortString with various issuesWith spacing"
new_string = re.sub(r"([a-z])([A-Z])", r"\1 \2", a)
print(new_string)
See the Python demo.
Note that without the r''
prefix Python interpreted the \1
and \2
as characters rather than as backreferences since the \
was parsed as part of an escape sequence. In raw string literals, \
is parsed as a literal backslash.
Upvotes: 1
Reputation: 2935
>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub("([a-z])([A-Z])", r"\1 \2", a)
'Here is a short String with various issues With spacing'
capturing group and backslash escaping was missing.
you can go even further:
>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub('([a-z])([A-Z])', r'\1 \2', a).lower().capitalize()
'Here is a short string with various issues with spacing'
Upvotes: 2