Reputation: 622
I am parsing values from a file, some of which can be string literals, enclosed in double quotes. To get the actual value I have to strip the double quotes:
>>> raw_value = r'"I am a string"'
>>> processed_value = raw_value.strip('"')
>>> print(processed_value)
I am a string
However, some values contain escaped double quotes, which can be at the end:
>>> raw_value = r'"Simon said: \"Jump!\""'
>>> processed_value = raw_value.strip('"')
>>> print(processed_value)
Simon said: \"Jump!\
You see my problem here: the escaped double quote is stripped away which leaves an orphaned double quote when I write the file back and makes it unreadable. I could do:
def unique_strip(some_str):
beginning = 1 if some_str.startswith('"') else 0
end = -1 if some_str.endswith('"') and some_str[-2] != "\\" else None
return some_str[beginning:end]
Using previous example:
>>> unique_strip(raw_value)
'Simon said: \\"Jump!\\"'
>>> raw_value = r'"Simon said: \"Jump!\"'
>>> unique_strip(raw_value)
'Simon said: \\"Jump!\\"'
So now it even works if the trailing double quote is missing. Is there a more pythonic way to do this, using built-in strip
for example ? If not, is there anything wrong or any loophole in my method ?
Update
I guess my function raises IndexError
for an input like some_str = '"'
. So maybe:
def unique_strip(some_str):
beginning = 1 if some_str.startswith('"') else 0
end = -1 if len(some_str) > 1 and some_str.endswith('"') and some_str[-2] != "\\" else None
return some_str[beginning:end]
Upvotes: 0
Views: 2073
Reputation: 7351
The easiest but not the safest way is to replace the \"
with some string that will not occur elsewhere. Then strip, and replace back.
raw_value = r'"Simon said: \"Jump!\""'
IMPOSSIBLE_STR = '\\"3'
raw_value.replace('\\"', IMPOSSIBLE_STR).strip('"').replace(IMPOSSIBLE_STR,'\\"')
Out[102]: 'Simon said: \\"Jump!\\"'
I suppose it's very unlikely to have \"
followed by a number.
Regex will probably solve the problem better, conditioned on that you write the correct regex!
Upvotes: 2