Reputation: 198
Given I have a quote that contains a double sub-quote with a '\n', If one performs a splitlines on the parent quote, the child quote is split too.
double_quote_in_simple_quote = 'v\n"x\ny"\nz'
print(double_quote_in_simple_quote.splitlines())
Resulting output
['v', '"x', 'y"', 'z']
I would have expected the following:
['v', '"x\ny"', 'z']
Because the '\n' is in the scope of the sub-quote.
I was hoping to get an explanation why it behaves as such and if you have any alternative to 'splitlines' at the level of the main quote only?
Thank you
Upvotes: 1
Views: 795
Reputation: 189749
The split
function doesn't care about additional levels of quoting; it simply splits on every occurrence of the character you split on. (There isn't really a concept of nested quoting; a string is a string, and may or may not contain literal quotes, which are treated the same as any other character.)
If you want to implement quoting inside of strings, you have to do it yourself.
Perhaps use a regular expression;
import re
tokens = re.findall(r'"[^"]*"|[^"]*', double_quote_in_simple_quote)
splitresult = [
x if x.startswith('"') else x.split('\n')
for x in tokens]
Demo: https://ideone.com/lAgJTb
Upvotes: 1
Reputation: 993
If you double-quote a string in Python, that doesn't mean there are nested strings, per se. Whatever the outermost quotes are, Python will start and end the string object according to that. Any internal quote-like characters are treated as the ascii characters.
>>> print('dog')
dog
>>> print('"dog"')
"dog"
Note how in the second line, the quotes are also printed, because those actual quote-characters are a part of the string. No nesting happening.
Upvotes: 1
Reputation: 345
I came up with a nasty code that can get you around while you try to find another method that splits quotes without the characteristics that makes Python's behaves as it does.
double_quote_in_simple_quote = '"x\ny"'
double_quote_in_simple_quote = double_quote_in_simple_quote.replace("\n", "$n")
splitted_quote = double_quote_in_simple_quote.splitlines()
print(splitted_quote)
splitted_quote_decoded = [quote.replace('$n', '\n') for quote in splitted_quote]
print(splitted_quote_decoded)
The idea is to replace the \n by something not meaningful yet not used, and then reverse it. I used your example, but I'm sure you will be able to tune it to fit your needs. My output was:
['"x$ny"']
['"x\ny"']
Upvotes: 1
Reputation: 464
It is due to the nature of escape sequences in Python.
\n
in python means a new line character. Whenever this sequence is captured by python, it treats it as line breakers and considers skipping a line. splitlines()
method splits a string into a list and the splitting is done at line breaks. That's why you get a list without new line character.
However, you can get away with it by specifying a parameter which won't consider the escape line by default :
print(double_quote_in_simple_quote.splitlines(keepends=True))
>>> ['"x\\ny"']
Upvotes: 1