Reputation: 1
I'm starting with Python and for a design I've to validate a string that must have this format:
aaa...a aaa...a(bbb...b) aaa...a(bbb...b)ccc...c aaa...a(bbb...b)ccc...c(ddd...d)
where aaa..a, bbb...b, ccc..c, ddd..d are integer number.
The length of the string shall be arbitrary.
There are no spaces into the string.
Only single bracket allowed.
I've approached the problem as a finite state machine with two states.
I like to know if there is a best approach to solve this task and your impressions about it and every your hint.
Just as side information I've did some test by means of regexp but this seems to me a recursive pattern validation issue and I'm not sure that can be easily do in Python, but I'm not a expert into regexp, but I suppose that if this task should be possible may be executed with one single row of code.
Tha main advantage that I can see with the fsm approach is to notify to the user where a error is located into the input string and then make more easy (from the user perspective) the checking and correction task.
[EDIT] I've discovered a wrong detecting behaviour and now the code was corrected, are not allowed two consecutive group of brackt e.g. 10(200)(300). Also I've reformatted the code as a function.
""" String parser for string formatted as reported below: aaa...a aaa...a(bbb...b) aaa...a(bbb...b)ccc...c(ddd...d) where: aaa...a, bbb...b = integer number Not valid (some example) () (aaa...a) aaa...a() aaa...a(bbb...b)ccc...d aaa...a((bbb....b)) """ import sys import re def parse_string(buffer): # Checking loop state = 1 old_state = 1 next_state = 1 strlen = len(buffer) initial = True success = False is_a_number = re.compile("[0-9]") for index, i in enumerate(buffer): car = i # State 1 if (state == 1): if is_a_number.match(car): if (index != strlen-1): # If is a number e not the last I've to wait for the next char "(" or number next_state = 1 else: if (initial): # If is a number and is also the last of the initial block -> I've finish to parse success = True break else: # Is the last number but not into the initial block of numbers -> error success = False break else: if (car == "("): if (old_state == 2): # Can't have two (...)(...) consecutively success = False break if ((index == 0) or (index == strlen-1)): # The ( can't be the first or the last char success = False break else: # Step to the next state next_state = 2 initial = False else: # Wrong char detected success = False break if (state == 2): if is_a_number.match(car): if (index != strlen-1): # The char is a number and is not the last of the string next_state = 2 else: # If is a number and is also the last I've a error due to a missing ")" success = False break else: if (car == ")"): if (old_state == 1): # The sequence () is not allowed success = False break elif ((old_state == 2) and (index != strlen-1)): # The previous char was a number next_state = 1 else: # I'm on the last char of the string success = True break else: # Wrong char detected success = False break print("current state: "+ str(state) + " next_state: " + str(next_state)) # Update the old and the new state old_state = state state = next_state return(success, state, index) if __name__ == "__main__": # Get the string from the command line # The first argument (index = 0) is the script name, the supplied parameters start from the idex = 1 number_cmd = len(sys.argv) - 1 if (number_cmd != 1): print ("Error: request one string as input!") sys.exit(0) # Get the string buffer = sys.argv[1].strip() print("================================") print("Parsing: " + buffer) print("Checking with fsm") print("--------------------------------") # Parse the string success, state, index = parse_string(buffer) # Check result if (success): print("String validated!") print("================================") else: print("Syntax error detected in state: " + str(state) + "\n" + "position: " + str(buffer[:index+1])) print("================================") # Exit from script sys.exit(0)
Upvotes: 0
Views: 882
Reputation: 2004
This can be done with regular expression. Here is an example in python which you also can try out on regex101:
Regular Expression Pattern: (\d+)(\(\d+\)(\d+(\(\d+\))?)?)?
And this would be the python code:
import re
p = re.compile(ur'(\d+)(\(\d+\)(\d+(\(\d+\))?)?)?')
test_str = u"1000(20)30(345)"
re.match(p, test_str)
If you want to make check for example following input 1000(20)30(345)
you can add a ^
before the regex and a $
at the end.
Upvotes: 0
Reputation: 18513
Finite state machines and regular expressions are equivalent in expressive power. They both can be used to parse regular languages. So if your problem can be solved with a FSM, it can also be solved with a regular expression.
If recursive parentheses are allowed, like 1(123(345)12)
, then it is not a regular language, neither FSM nor regex can parse the string. But from your description and script, I guess recursive parentheses are not allowed. Regular expression can work.
Your requirements:
To get the precise location of error, you cannot use one regex to match the whole string. You can use regex \(|\)
to split the string, and [0-9]+
to match each segment. Then, you only need to make sure the parentheses match.
Here is my script:
import re
def parse_input(s):
s = s.strip()
digits = re.compile("[0-9]+")
segments = re.split("(\(|\))",s)
if not segments:
print "Error: blank input"
return False
if not segments[0]: # opens with parentheses
print "Error: cannot open with parenthese"
return False
in_p = False
def get_error_context(i):
prefix = segments[i-1] if i>0 else ""
suffix = segments[i+1] if i<len(segments)-1 else ""
return prefix + segments[i] + suffix
for i, segment in enumerate(segments):
if not segment: # blank is not allowed within parentheses
if in_p:
print "Error: empty parentheses not allowed, around '%s'"%get_error_context(i)
return False
else:
print "Error: no digits between ) and (, around '%s'"%get_error_context(i)
return False
elif segment == "(":
if in_p:
print "Error: recursive () not allowed, around '%s'"%get_error_context(i)
return False
else:
in_p = True
elif segment == ")":
if in_p:
in_p = False
else:
print "Error: ) with no matching (, around '%s'"%get_error_context(i)
return False
elif not digits.match(segment):
print "Error: non digits, around '%s'"%get_error_context(i)
return False
if in_p:
print "Error: input ends with an open parenthese, around '%s'"%get_error_context(i)
return False
return True
And tests:
>>> parse_input("12(345435)4332(34)")
True
>>> parse_input("(345435)4332(34)")
Error: cannot open with parenthese
False
>>> parse_input("sdf(345435)4332()")
Error: non digits, around 'sdf('
False
>>> parse_input("123(345435)4332()")
Error: empty parentheses not allowed, around '()'
False
>>> parse_input("34324(345435)(34)")
Error: no digits between ) and (, around ')('
False
>>> parse_input("123(344332()")
Error: recursive () not allowed, around '344332('
False
>>> parse_input("12)3(3443)32(123")
Error: ) with no matching (, around '12)3'
False
>>> parse_input("123(3443)32(123")
Error: input ends with an open parenthese, around '(123'
False
Upvotes: 2