Julienm
Julienm

Reputation: 198

splitlines of quote splits '\n' in sub-quote

Given I have a quote that contains a double sub-quote with a '\n', If one performs a splitlines on the parent quote, the child quote is split too.

double_quote_in_simple_quote = 'v\n"x\ny"\nz'
print(double_quote_in_simple_quote.splitlines())

Resulting output

['v', '"x', 'y"', 'z']

I would have expected the following:

['v', '"x\ny"', 'z']

Because the '\n' is in the scope of the sub-quote.

I was hoping to get an explanation why it behaves as such and if you have any alternative to 'splitlines' at the level of the main quote only?

Thank you

Upvotes: 1

Views: 795

Answers (4)

tripleee
tripleee

Reputation: 189749

The split function doesn't care about additional levels of quoting; it simply splits on every occurrence of the character you split on. (There isn't really a concept of nested quoting; a string is a string, and may or may not contain literal quotes, which are treated the same as any other character.)

If you want to implement quoting inside of strings, you have to do it yourself.

Perhaps use a regular expression;

import re

tokens = re.findall(r'"[^"]*"|[^"]*', double_quote_in_simple_quote)
splitresult = [
    x if x.startswith('"') else x.split('\n') 
    for x in tokens]

Demo: https://ideone.com/lAgJTb

Upvotes: 1

Nick Pandolfi
Nick Pandolfi

Reputation: 993

If you double-quote a string in Python, that doesn't mean there are nested strings, per se. Whatever the outermost quotes are, Python will start and end the string object according to that. Any internal quote-like characters are treated as the ascii characters.

>>> print('dog')
dog
>>> print('"dog"')
"dog"

Note how in the second line, the quotes are also printed, because those actual quote-characters are a part of the string. No nesting happening.

Upvotes: 1

ignacioct
ignacioct

Reputation: 345

I came up with a nasty code that can get you around while you try to find another method that splits quotes without the characteristics that makes Python's behaves as it does.

double_quote_in_simple_quote = '"x\ny"'

double_quote_in_simple_quote = double_quote_in_simple_quote.replace("\n", "$n")

splitted_quote = double_quote_in_simple_quote.splitlines()
print(splitted_quote)

splitted_quote_decoded = [quote.replace('$n', '\n') for quote in splitted_quote]


print(splitted_quote_decoded)

The idea is to replace the \n by something not meaningful yet not used, and then reverse it. I used your example, but I'm sure you will be able to tune it to fit your needs. My output was:

['"x$ny"']
['"x\ny"']

Upvotes: 1

Raghav Gupta
Raghav Gupta

Reputation: 464

It is due to the nature of escape sequences in Python.

\n in python means a new line character. Whenever this sequence is captured by python, it treats it as line breakers and considers skipping a line. splitlines() method splits a string into a list and the splitting is done at line breaks. That's why you get a list without new line character.

However, you can get away with it by specifying a parameter which won't consider the escape line by default :

print(double_quote_in_simple_quote.splitlines(keepends=True))
>>> ['"x\\ny"']

Upvotes: 1

Related Questions