Reputation: 281
I've asked this question in the past here: "Replacing part of string keeps adding extra backslash" but the issue is still proving troublesome to resolve.
ISSUE: Using re.sub()
I am unable to insert an odd amount of backslashes into part of my string. Assuming I have the following string:
sample_string = 'foo_${bar}_${wasd}_asdf$'
I want my output string to be the following:
new_string = 'foo_\\\\\${bar}_\\\\\${wasd}_asdf$'
Here is just a small sample of everything that I have tried:
new_string = re.sub(r'\$\{bar\}_\$\{wasd\}', "\\\\\\\\\\${bar}_\\\\\\\\\\${wasd}", sample_string)
#new_string ends up being: 'foo_\\\\\\${bar}_\\\\\\${wasd}_asdf$'
new_string = re.sub(r'[$][{]', "\\\\\\\\\\${", sample_string)
#new_string ends up being: 'foo_\\\\\\${bar}_\\\\\\${wasd}_asdf$'
new_string = re.sub(r'[$][{]', r"\\\\\${", sample_string)
#new_string ends up being: 'foo_\\\\\\${bar}_\\\\\\${wasd}_asdf$'
As you can see, I've treated the replacement string as both a regular string where backslash is used as an escape character, and as a raw string where the backslash is not treated as an escape. Strangely, both methods of approach insert 6 backslashes into new_string
rather than 5.
Also, here are some outputs below of when I tried to insert a different number of backslashes into sample_string
:
#Insert 3 backslashes - works as NOT expected
new_string = re.sub(r'[$][{]', r"\\\${", sample_string)
#new_string ends up being: 'foo_\\\\${bar}_\\\\${wasd}_asdf$'
#Insert 4 backslashes - works AS expected
new_string = re.sub(r'[$][{]', r"\\\\${", sample_string)
#new_string ends up being: 'foo_\\\\${bar}_\\\\${wasd}_asdf$'
#Insert 5 backslashes - works as NOT expected
new_string = re.sub(r'[$][{]', r"\\\\${", sample_string)
#new_string ends up being: 'foo_\\\\\\${bar}_\\\\\\${wasd}_asdf$'
#Insert 6 backslashes - works AS expected
new_string = re.sub(r'[$][{]', r"\\\\\\${", sample_string)
#new_string ends up being: 'foo_\\\\\\${bar}_\\\\\\${wasd}_asdf$'
If I could get some help as to why I can't substitute in 5 or 3 backslashes correctly, but I can substitute in 4 or 6 backslashes correctly, I would GREATLY appreciate it!!
Upvotes: 0
Views: 66
Reputation: 12938
That is strange. I am guessing it has to do with Python choosing whether to display escape characters. For instance, if I do:
new_string = re.sub(r'\${', r'\\\\\\\\\\${', sample_string) # five sets of "\\"
new_string
# 'foo_\\\\\\\\\\${bar}_\\\\\\\\\\${wasd}_asdf$' --- still five sets of "\\"
print(new_string)
# foo_\\\\\${bar}_\\\\\${wasd}_asdf$ --- just five "\"
é voilà, five back-slashes. I think if you just display the string, Python displays it with escaped back-slashes. If you print it, Python processes the back-slash escapes.
Interestingly, Python seems to assume that you mean to specify escaped back-slashes in your replacement string. Example:
new_string = re.sub(r'\${', r'\${', sample_string) # shouldn't do anything, right?
new_string
# 'foo_\\${bar}_\\${wasd}_asdf$' --- escapes were added!
print(new_string)
# foo_\${bar}_\${wasd}_asdf$ --- now we have explicit back-slashes!
So on the match side, characters like "$"
still need to be escaped. Makes sense; these are special characters to regex, so if we want to match the actual dollar-sign character we have to escape it. But on the replacement string side, these characters no longer have syntactic meaning, and so need not be escaped. Thus any extra escape sequences themselves get escaped! Thus the second example that looks like it shouldn't do anything in fact adds a back slash in front of the dollar-sign, which Python will escape if you display the string directly, making it look like two are added. If you want what's actually in the string, it looks like you have to print it.
The last two paragraphs of the string literals docs seem to support this (thanks @glibdud for pointing this out). Some select quotes:
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. [...]
Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example,
r"\""
is a valid string literal consisting of two characters: a backslash and a double quote[...]
Upvotes: 1