Reputation: 687
I'm trying to read a file that may contain strings that include \\
, \n
, and \t
, and I want to write those to another file as \
, newline, and tab. My attempt with re.sub
doesn't seem to be working in my .py
file, but it seems to be working in the interpreter.
Here's the function I wrote to try to achieve this:
def escape_parser(snippet):
snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
return snippet
which causes sre_constants.error: bogus escape (end of line)
when the backslash replacement line is included, and doesn't appear to replace the literal string \t
or \n
with a tab or newline when I comment out the backslash line.
I played around in the interpreter to see if I could figure out a solution, but everything behaved as I'd (naively) expect.
$ python3
Python 3.4.0 (default, Mar 24 2014, 02:28:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> import re
>>> test = "for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}"
>>> print(re.sub(r"\n", "\n", test))
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> print(test)
for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}
>>> test
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
>>> t2 = re.sub(r"\n", "foo", test)
>>> t2
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)foo{foo\t$0foo}'
As for actually writing to the file, I have
with open(os.path.join(target_path, name), "w") as out: out.write(snippet)
Although I've tried using print(snippet, end="", file=out)
, too.
Edit: I've looked at similar questions like Python how to replace backslash with re.sub() and How to write list of strings to file, adding newlines?, but those solutions don't quite work, and I'd really like to do this with a regex if possible because it seems like they're a more powerful tool than Python's standard string processing functions.
Edit2: Not sure if this helps, but I thought I'd try to print what's going on in the function:
def escape_parser(snippet):
print(snippet)
print("{!r}".format(snippet))
# snippet = re.sub(r"\\", "\\", snippet)
snippet = re.sub(r"\t", "\t", snippet)
snippet = re.sub(r"\n", "\n", snippet)
print(snippet)
print("{!r}".format(snippet))
return snippet
yields
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}
'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\\n{\\n\\t$0\\n}'
Edit3: Changing snippet = re.sub(r"\\", "\\", snippet)
to snippet = re.sub(r"\\", r"\\", snippet)
as per @BrenBarn's advice, and adding a test string in my source file yields
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
insert just one backslash: \\ (that's it)
"insert just one backslash: \\\\ (that's it)"
So I must have missed something obvious. It's a good thing one doesn't need a license to program.
Edit4: As per Process escape sequences in a string in Python, I changed escape_parser
to this:
def escape_parser(snippet):
print("pre-escaping: '{}'".format(snippet))
# snippet = re.sub(r"\\", r"\\", snippet)
# snippet = re.sub(r"\t", "\t", snippet)
# snippet = re.sub(r"\n", "\n", snippet)
snippet = bytes(snippet, "utf-8").decode("unicode_escape")
print("post-escaping: '{}'".format(snippet))
return snippet
which works in a sense. My original intention was to only replace \\
, \n
, and \t
, but this goes further than that, which isn't exactly what I wanted. Here's how things look after being run through the function (It appears print
and write
work the same for these. I may have been mistaken about print
and write
not matching up because it appears the editor I was using to inspect the output files wouldn't update if new changes were made.):
pre-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)\n{\n\t$0\n}'
post-escaping: 'for(int ${1:i}; $1 < ${2:STOP}; ++$1)
{
$0
}'
pre-escaping: 'insert just one backslash: \\ (that's it)'
post-escaping: 'insert just one backslash: \ (that's it)'
pre-escaping: 'source has one backslash \ <- right there'
post-escaping: 'source has one backslash \ <- right there'
pre-escaping: 'what about a bell \a like that?'
post-escaping: 'what about a bell like that?'
Upvotes: 0
Views: 294
Reputation: 251408
It's hard to tell if this is your main problem without seeing some data, but one problem is that you need to change your first replace to:
snippet = re.sub(r"\\", r"\\", snippet)
The reason is that backslashes have meaning in the replacement pattern as well (for group backreferences), so a single backslash is not a valid replacement string.
Upvotes: 2