sam boosalis
sam boosalis

Reputation: 1977

syntactic whitespaces with pyparsing's operatorPrecedence

is it possible to use some number of spaces as a delimeter? what i mean is...

given some python operator-precedence parser, i want to mix natural language with operators, in a shorthand for taking notes, i.e. 'caffeine : A1 antagonist -> caffeine : peripheral stimulant' has the interpretation 'caffeine is an A1 antagonist implies that it is a peripheral stimulant'.

e.g. i want to be able to parse this parse('a:b -> c : d e') as [[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]

with something like this

operands = delimitedList(Word(alphanums), delim=',') 
# delim=' ' obviously doesn't work

precedence = [
    (":", 2, opAssoc.LEFT),
    ("->", 2, opAssoc.LEFT),
    ]

parser = operatorPrecedence(operands, precedence)

def parse(s): return parser.parseString(s, parseAll=True)

print parse('a:b -> c : d e')

possible?

Upvotes: 0

Views: 189

Answers (1)

abarnert
abarnert

Reputation: 365945

After thinking it over, I think the language you're trying to define is ambiguous, but there are multiple ways to fix that.

You want this:

parse('a:b -> c : d e')

To give you this:

[[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]

You've implied that you want whitespace to act as an operator. But then why isn't it an operator in the context of 'c :'? What's the rule for when it is and when it isn't an operator?

Either that, or you want each operand to be a space-separated list of words. But in that case, why is that 'a' instead of ['a']? Either each of the operands is a list, or none of them are, right? It's clearly not position-dependent, and you haven't specified any other rule.

There is (at least) one plausible rule that fits what you have in mind: Collapse any operand that's a single-element list down to just that element. But that's a strange rule—and when you later use this parse tree for whatever purpose you're using it for, you have to effectively reverse the same rule, by writing code that handles a single word as if it were a one-word list. So… why do it that way?

I can think of three better alternatives:

  1. Require every operand to be a space-delimited list of words.
  2. Allow spaces in the middle of operands.
  3. Use default whitespace handling, and allow multiple terms on each side of any operator.

Any of these are very easy to parse, and give you a parse tree that's very easy to use. I'd probably go with #2, but since I already explained how to do that in a comment above, let's do #3 here:

>>> operands = OneOrMore(Word(alphanums))
>>> precedence = [
...     (":", 2, opAssoc.LEFT),
...     ("->", 2, opAssoc.LEFT),
...     ]
>>> parser = operatorPrecedence(operands, precedence)
>>> def parse(s): return parser.parseString(s, parseAll=True)
>>> print(parse('a:b -> c : d e'))
[[['a', ':', 'b'], '->', ['c', ':', 'd', 'e']]]
>>> print(parse('caffeine : A1 antagonist -> caffeine : peripheral stimulant'))
[[['caffeine', ':', 'A1', 'antagonist'], '->', ['caffeine', ':', 'peripheral', 'stimulant']]]

Upvotes: 4

Related Questions