Reputation: 1
I want to code a BinaryTree
parser. I don't know how to solve this problem. I've tried using regular expressions recursively but I can't find good resources. My goal is:
BinaryTree.from_string("('a') 'b' ('c')")
--> BinaryTree("a", "b", "c")
BinaryTree.from_string("")
--> None
BinaryTree.from_string("() ()")
--> BinaryTree(None, None, None)
BinaryTree.from_string("((1) 2 (3)) 4 (5)")
--> BinaryTree(BinaryTree(1, 2, 3), 4, 5)
Here is some source code:
class BinaryTree:
def __init__(self, left=None, name=None, right=None):
self.left = left
self.name = name
self.right = right
def __str__(self):
return f"({self.left}) {self.name} ({self.right})"
def __repr__(self):
return f"BinaryTree({repr(self.left)}, {repr(self.name)}, {repr(self.right)})"
def __len__(self):
if self.name is not None:
output = 1
else:
output = 0
if self.left is not None:
output += len(self.left)
if self.right is not None:
output += len(self.right)
return output
@staticmethod
def from_string(string):
# "(x) y (z)" --> BinaryTree("x", "y", "z")
# "((a) b (c)) y (z)" --> BinaryTree(BinaryTree("a", "b", "c"), "y", "z")
# "" --> None
# () () --> BinaryTree("", "", "")
pass
Upvotes: 0
Views: 669
Reputation: 350272
You can use regular expressions, but only use it to tokenize the input, i.e. it should match all parentheses as individual matches, and match quoted or non-quoted literals. Some care has to be taken to support backslash escaping inside quoted substrings.
For converting quoted strings and numbers to the corresponding Python data typed values, you can use ast.literal_eval
. Of course, if the input format would have been a valid Python expression (using comma separators, ...etc), you could leave the parsing completely to ast.literal_eval
. But as this is not the case, you'll have to tokenize the input and iterate over the tokens.
So import these:
import re
import ast
And then:
@staticmethod
def from_string(string):
tokens = re.finditer(r"[()]|'(?:\\.|[^'])*'|[^\s()]+|$", string)
def format(token):
return f"'{token}'" if token else "end of input"
def take(end, expect=None, forbid=None):
token = next(tokens).group(0)
if expect is not None and token != expect:
raise ValueError("Expected {}, got {}".format(format(expect), format(token)))
if end != "" and token == "" or token == forbid:
raise ValueError("Unexpected {}".format(format(token)))
if token not in ("(", ")", ""):
token = ast.literal_eval(token)
return token
def recur(end=")"):
token = take(end)
if token == end: # it is an empty leaf
return None
if token != "(": # it is a leaf
take(end, end)
return token
# It is a (left)-name-(right) sequence:
left = recur()
name = None
token = take(end, None, end)
if token != "(":
name = token
take(end, "(")
right = recur()
take(end, end)
return BinaryTree(left, name, right)
return recur("")
Upvotes: 0
Reputation: 77847
First, I believe that you need to drop the idea of regular expressions and concentrate on simple matching the parentheses. You have a very simple expression grammar here. Rather than reproducing such a well-traveled exercise, I simply direct your to research how to parse a binary tree expression with parentheses.
The basic expression is
left root right
where each of left
and right
is either
Note that you have some ambiguities. For instance, given a b
, is the resulting tree (a, b, None)
, (None, a, b)
, or an error?
In any case, if you focus on simple string processing, you should be able to do this without external packages:
Can you take it from there?
Upvotes: 1