Ryan M
Ryan M

Reputation: 687

Python: unexpected behavior with printing/writing escape characters

I'm trying to read a file that may contain strings that include \\, \n, and \t, and I want to write those to another file as \, newline, and tab. My attempt with re.sub doesn't seem to be working in my .py file, but it seems to be working in the interpreter.

Here's the function I wrote to try to achieve this:

def escape_parser(snippet):
    snippet = re.sub(r"\\", "\\", snippet)
    snippet = re.sub(r"\t", "\t", snippet)
    snippet = re.sub(r"\n", "\n", snippet)

    return snippet

which causes sre_constants.error: bogus escape (end of line) when the backslash replacement line is included, and doesn't appear to replace the literal string \t or \n with a tab or newline when I comment out the backslash line.

I played around in the interpreter to see if I could figure out a solution, but everything behaved as I'd (naively) expect.

$ python3
Python 3.4.0 (default, Mar 24 2014, 02:28:52) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> import re
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> print(re.sub(r"\n", "\n", test))
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}
>>> print(test)
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}
>>> test
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
>>> t2 = re.sub(r"\n", "foo", test)
>>> t2
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)foo{foo\t$0foo}'

As for actually writing to the file, I have

with open(os.path.join(target_path, name), "w") as out: out.write(snippet)

Although I've tried using print(snippet, end="", file=out), too.

Edit: I've looked at similar questions like Python how to replace backslash with re.sub() and How to write list of strings to file, adding newlines?, but those solutions don't quite work, and I'd really like to do this with a regex if possible because it seems like they're a more powerful tool than Python's standard string processing functions.

Edit2: Not sure if this helps, but I thought I'd try to print what's going on in the function:

def escape_parser(snippet):                                                                                                                                                                                       
    print(snippet)                                                                                                                                                                                                
    print("{!r}".format(snippet))                                                                                                                                                                                 

    # snippet = re.sub(r"\\", "\\", snippet)                                                                                                                                                                      
    snippet = re.sub(r"\t", "\t", snippet)                                                                                                                                                                        
    snippet = re.sub(r"\n", "\n", snippet)                                                                                                                                                                        

    print(snippet)                                                                                                                                                                                                
    print("{!r}".format(snippet))                                                                                                                                                                                 

    return snippet

yields

for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'

Edit3: Changing snippet = re.sub(r"\\", "\\", snippet) to snippet = re.sub(r"\\", r"\\", snippet) as per @BrenBarn's advice, and adding a test string in my source file yields

insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"

So I must have missed something obvious. It's a good thing one doesn't need a license to program.

Edit4: As per Process escape sequences in a string in Python, I changed escape_parser to this:

def escape_parser(snippet):                                                                                                                                                                                                                                                                                                                                             
    print("pre-escaping: '{}'".format(snippet))                                                                                                                                                                   

    # snippet = re.sub(r"\\", r"\\", snippet)                                                                                                                                                                     
    # snippet = re.sub(r"\t", "\t", snippet)                                                                                                                                                                      
    # snippet = re.sub(r"\n", "\n", snippet)                                                                                                                                                                      
    snippet = bytes(snippet, "utf-8").decode("unicode_escape")                                                                                                                                                    

    print("post-escaping: '{}'".format(snippet))                                                                                                                                                                  

    return snippet

which works in a sense. My original intention was to only replace \\, \n, and \t, but this goes further than that, which isn't exactly what I wanted. Here's how things look after being run through the function (It appears print and write work the same for these. I may have been mistaken about print and write not matching up because it appears the editor I was using to inspect the output files wouldn't update if new changes were made.):

pre-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
post-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
    $0
}'
pre-escaping: 'insert just one backslash: \\ (that's it)'
post-escaping: 'insert just one backslash: \ (that's it)'
pre-escaping: 'source has one backslash \ <- right there'
post-escaping: 'source has one backslash \ <- right there'
pre-escaping: 'what about a bell \a like that?'
post-escaping: 'what about a bell  like that?'

Upvotes: 0

Views: 294

Answers (1)

BrenBarn
BrenBarn

Reputation: 251408

It's hard to tell if this is your main problem without seeing some data, but one problem is that you need to change your first replace to:

snippet = re.sub(r"\\", r"\\", snippet)

The reason is that backslashes have meaning in the replacement pattern as well (for group backreferences), so a single backslash is not a valid replacement string.

Upvotes: 2

Related Questions