user3496060
user3496060

Reputation: 856

regex.sub unexpectedly modifying the substituting string with some kind of encoding?

I have a path string "...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." that I am inserting into an existing text file in place of another string. I compile a regex pattern and find bounding text to get the location to insert, and then use regex.sub to replace it. I'm doing something like this...

with open(imextXML, 'r') as file:
    filedata = file.read()
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
filedatatemp = redirpath.sub(newdir,filedata)

The inserted text is messed up though, with "\\20" being replaced with "\x8" and "\\" replaced with "\" (single slash)

i.e. "...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." becomes "...\\JustStuff\x817GrainHarvest_GQimagesTestStand\..."

What simple thing am I missing here to fix it?

Update:

to break this down even further to copy and paste to reproduce the issue...

t2 = r'\JustStuff\2017GrainHarvest_GQimagesTestStand\te'
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
temp = r"<directoryPath>aasdfgsdagewweags</directoryPath>"
redirpath.sub(t2,temp)

produces...

>>'<directoryPath>\\JustStuff\x817GrainHarvest_GQimagesTestStand\te</directoryPath>'

Upvotes: 4

Views: 137

Answers (1)

Turn
Turn

Reputation: 7030

When you define the string that you want to insert, prefix it with an r to indicate that it is a raw string literal:

>>> rex = re.compile('a')
>>> s = 'path\\with\\2017'
>>> sr = r'path\\with\\2017'
>>> rex.sub(s, 'ab')
'path\\with\x817b'
>>> rex.sub(sr, 'ab')
'path\\with\\2017b'

Upvotes: 2

Related Questions