Valentin B.
Valentin B.

Reputation: 622

Stripping a character only once in python

I am parsing values from a file, some of which can be string literals, enclosed in double quotes. To get the actual value I have to strip the double quotes:

>>> raw_value = r'"I am a string"'
>>> processed_value = raw_value.strip('"')
>>> print(processed_value)
I am a string

However, some values contain escaped double quotes, which can be at the end:

>>> raw_value = r'"Simon said: \"Jump!\""'
>>> processed_value = raw_value.strip('"')
>>> print(processed_value)
Simon said: \"Jump!\

You see my problem here: the escaped double quote is stripped away which leaves an orphaned double quote when I write the file back and makes it unreadable. I could do:

def unique_strip(some_str):

    beginning = 1 if some_str.startswith('"') else 0
    end = -1 if some_str.endswith('"') and some_str[-2] != "\\" else None
    return some_str[beginning:end]

Using previous example:

>>> unique_strip(raw_value)
'Simon said: \\"Jump!\\"'
>>> raw_value = r'"Simon said: \"Jump!\"'
>>> unique_strip(raw_value)
'Simon said: \\"Jump!\\"'

So now it even works if the trailing double quote is missing. Is there a more pythonic way to do this, using built-in strip for example ? If not, is there anything wrong or any loophole in my method ?


Update

I guess my function raises IndexError for an input like some_str = '"'. So maybe:

def unique_strip(some_str):

    beginning = 1 if some_str.startswith('"') else 0
    end = -1 if len(some_str) > 1 and some_str.endswith('"') and some_str[-2] != "\\" else None
    return some_str[beginning:end]

Upvotes: 0

Views: 2073

Answers (1)

jf328
jf328

Reputation: 7351

The easiest but not the safest way is to replace the \" with some string that will not occur elsewhere. Then strip, and replace back.

raw_value = r'"Simon said: \"Jump!\""'

IMPOSSIBLE_STR = '\\"3'
raw_value.replace('\\"', IMPOSSIBLE_STR).strip('"').replace(IMPOSSIBLE_STR,'\\"')
Out[102]: 'Simon said: \\"Jump!\\"'

I suppose it's very unlikely to have \" followed by a number.

Regex will probably solve the problem better, conditioned on that you write the correct regex!

Upvotes: 2

Related Questions