Reputation: 1977
is it possible to use some number of spaces as a delimeter? what i mean is...
given some python operator-precedence parser, i want to mix natural language with operators, in a shorthand for taking notes, i.e. 'caffeine : A1 antagonist -> caffeine : peripheral stimulant'
has the interpretation 'caffeine is an A1 antagonist implies that it is a peripheral stimulant'
.
e.g. i want to be able to parse this parse('a:b -> c : d e')
as [[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]
with something like this
operands = delimitedList(Word(alphanums), delim=',')
# delim=' ' obviously doesn't work
precedence = [
(":", 2, opAssoc.LEFT),
("->", 2, opAssoc.LEFT),
]
parser = operatorPrecedence(operands, precedence)
def parse(s): return parser.parseString(s, parseAll=True)
print parse('a:b -> c : d e')
possible?
Upvotes: 0
Views: 189
Reputation: 365945
After thinking it over, I think the language you're trying to define is ambiguous, but there are multiple ways to fix that.
You want this:
parse('a:b -> c : d e')
To give you this:
[[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]
You've implied that you want whitespace to act as an operator. But then why isn't it an operator in the context of 'c :'
? What's the rule for when it is and when it isn't an operator?
Either that, or you want each operand to be a space-separated list of words. But in that case, why is that 'a'
instead of ['a']
? Either each of the operands is a list, or none of them are, right? It's clearly not position-dependent, and you haven't specified any other rule.
There is (at least) one plausible rule that fits what you have in mind: Collapse any operand that's a single-element list down to just that element. But that's a strange rule—and when you later use this parse tree for whatever purpose you're using it for, you have to effectively reverse the same rule, by writing code that handles a single word as if it were a one-word list. So… why do it that way?
I can think of three better alternatives:
Any of these are very easy to parse, and give you a parse tree that's very easy to use. I'd probably go with #2, but since I already explained how to do that in a comment above, let's do #3 here:
>>> operands = OneOrMore(Word(alphanums))
>>> precedence = [
... (":", 2, opAssoc.LEFT),
... ("->", 2, opAssoc.LEFT),
... ]
>>> parser = operatorPrecedence(operands, precedence)
>>> def parse(s): return parser.parseString(s, parseAll=True)
>>> print(parse('a:b -> c : d e'))
[[['a', ':', 'b'], '->', ['c', ':', 'd', 'e']]]
>>> print(parse('caffeine : A1 antagonist -> caffeine : peripheral stimulant'))
[[['caffeine', ':', 'A1', 'antagonist'], '->', ['caffeine', ':', 'peripheral', 'stimulant']]]
Upvotes: 4