SCE
SCE

Reputation: 21

Python re.sub() function converting "\t" in a file path to a tab character

I am trying to take a cpp file that has already been written and add header files to the list of includes using a python script. Currently, I create a string that has all of the includes that I want to add, and then using the re module I replace on of the includes with my string. All of the includes have a "\t" in there name, and this is causing issues; instead of printing the line as expected (#include "abc\type\GenericTypeMT.h), I am getting #include "abc ype\GenericTypeMT.h. When I print my string to the console, it has the expected form which leads me to believe that this is an re.sub issue and not an issue writing to the file. Below is an the code.

import re
import string

INCLUDE = "#include \"abc\\type\\"

with open("file.h", "r+") as f:
     a = ""
     b = ""
     for line in file:
         a = a + line
     f.seek(0,0)
     types = open("types.txt", "r+")
     for t in types:
         head = INCLUDE + t.strip() + "MT.h"
         b = b + head + "\n"
     a = re.sub(r'#include "abc\\type\\GenericTypeMT\.h"', b, a)
     types.close()
     print b
     print a
     f.write(a)

The output for b is:

#include "abc\type\GenericTypeMT.h"
#include "abc\type\ServiceTypeMT.h"
#include "abc\type\AnotherTypeMT.h"

The (truncated) output for a is:

/* INCLUDES *********************************/
#include "abc   ype\GenericTypeMT.h"
#include "abc   ype\ServiceTypeMT.h"
#include "abc   ype\AnotherTypeMT.h"

#include <map>
...

The closest thing to my question that I could find was How to write \t to file using Python, but that is different than my problem, since mine seems to stem from the substitutions done by the regular expression, as shown by the print before the write.

Upvotes: 1

Views: 1537

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123860

The re.sub() function expands meta-characters (escape sequences) in the replacement string too. The \t character sequence (consisting of two characters, \ and t) in your replacement string interpreted, by the re module, as the escape sequence for a tab character:

>>> import re
>>> re.sub(r'^.', '\\t', 'foo')
'\too'
>>> print(re.sub(r'^.', '\\t', 'foo'))
    oo

But if you used a function for the replacement value, then no such expansion takes place. Note that this includes not processing placeholders, you'd have to use the match object passed into the function to create your own placeholder insertion logic.

You don't have any placeholders in your code, so a lambda to create the function should suffice:

a = re.sub(r'#include "abc\\type\\GenericTypeMT\.h"', lambda m: b, a)

Demo on the same contrived foo sample string from before:

>>> re.sub(r'^.', lambda m: '\\t', 'foo')
'\\too'
>>> print(re.sub(r'^.', lambda m: '\\t', 'foo'))
\too

The re.escape() function, is unfortunately too greedy with adding \ backslashes to many more characters than just replacement meta-characters; you'd end up with many more backslashes than you started with.

Note that because you don't actually do any pattern matching in your substitution, you may as well just use str.replace() to do the job:

a = a.replace(r'#include "abc\type\GenericTypeMT.h"', b)

The \ and . characters are no longer a meta character in a regular expression, so they doesn't need escaping either.

Upvotes: 2

Related Questions