Reputation: 1948
I have some code which I think should return all parts of a python statement that are not in strings. However, I'm not sure that is as rigorous as I would like. Basically, it just finds the next string delimiter and stays in the "string" state until it is closed by the same delimiter. Is there anything wrong with what I have done for some weird case that I have not thought of? Will it be in any way inconsistent with what python does?
# String delimiters in order of precedence
string_delims = ["'''",'"""',"'",'"']
# Get non string parts of a statement
def get_non_string(text):
out = ""
state = None
while True:
# not in string
if state == None:
vals = [text.find(s) for s in string_delims]
# None will only be reached if all are -1 (i.e. no substring)
for val,delim in zip(vals+[None], string_delims+[None]):
if val == None:
out += text
return out
if val >= 0:
i = val
state = delim
break
out += text[:i]
text = text[i+len(delim):]
else:
i = text.find(state)
if i < 0:
raise SyntaxError("Symobolic Subsystem: EOL while scanning string literal")
text = text[i+len(delim)]
state = None
Example Input:
get_non_string("hello'''everyone'''!' :)'''")
Example Output:
hello!
Upvotes: 4
Views: 103
Reputation: 76599
Your own code has problems with several cases, as you don't seem to be making any provisions for escaped quotes ("\""
, """\""""
, etc).
Also:
get_on_string('""')
throws an error.
I would not describe that as weird cases.
Upvotes: 1
Reputation: 879481
Python can tokenize Python code:
import tokenize
import token
import io
import collections
class Token(collections.namedtuple('Token', 'num val start end line')):
@property
def name(self):
return token.tok_name[self.num]
def get_non_string(text):
result = []
for tok in tokenize.generate_tokens(io.BytesIO(text).readline):
tok = Token(*tok)
# print(tok.name, tok.val)
if tok.name != 'STRING':
result.append(tok.val)
return ''.join(result)
print(get_non_string("hello'''everyone'''!' :)'''"))
yields
hello!
The heavy lifting is done by tokenize.generate_tokens.
Upvotes: 3