Bala
Bala

Reputation: 23

Unescape strings in Python

I have an input file that contains a list of inputs, one per line. Each line of input is enclosed in double quotes. The inputs sometimes have a backslash or few double-quotes as within the enclosing double-quotes (check example below).

Sample inputs —

"each line is enclosed in double-quotes"
"Double quotes inside a \"double-quoted\" string!"
"This line contains backslashes \\not so cool\\"
"too many double-quotes in a line \"\"\"too much\"\"\""
"too many backslashes \\\\\\\"horrible\"\\\\\\"

I would like to take the above inputs and simply convert the ones with the escaped double quotes in the lines to a back-tick `.

I assume that there is a straightforward one-line solution to this. I tried the following but it doesn't work. Any other one-liner solution or a fix to the below code would be greatly appreciated.

def fix(line):
    return re.sub(r'\\"', '`', line)

It fails for input lines 3 and 5.

"each line is enclosed in double-quotes"
"Double quotes inside a `double-quoted` string!"
"This line contains backslashes \\not so cool\`
"too many double-quotes in a line ```too much```"
"too many backslashes \\\\\\`horrible`\\\\\`

Any fix I can think of breaks other lines. Please help!

Upvotes: 2

Views: 702

Answers (2)

donkopotamus
donkopotamus

Reputation: 23206

This is not quite what you asked for as it replaces with " rather than `, but I'll mention it ... you could always leverage off csv to do \" conversion correctly for you:

>>> for line in csv.reader(["each line is enclosed in double-quotes",
...                         "Double quotes inside a \"double-quoted\" string!",
...                         "This line contains backslashes \\not so cool\\",
...                         "too many double-quotes in a line \"\"\"too much\"\"\"",
...                         "too many backslashes \\\\\\\"horrible\"\\\\\\",
...                         ]):
...         print(line)
...     
['each line is enclosed in double-quotes']
['Double quotes inside a "double-quoted" string!']
['This line contains backslashes \\not so cool\\']
['too many double-quotes in a line """too much"""']
['too many backslashes \\\\\\"horrible"\\\\\\']

If it is then important that they be actual `'s, you could simply do a replace on the text returned by the csv module.

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174776

Add + after backslash.

return re.sub(r'\\+"', '`', line)

Upvotes: 1

Related Questions