Python evaluate a string (Would Regular Expression work?)

Basically what I am trying to accomplish is whenever a number precedes a string within square brackets, the string within the square brackets should repeat as many times. I am learning regular expression and I posted a question how I can come up with a regular expression for this scenario and folks were kind enough to get me what I wanted. Basically the code below parses the string based on the regular expression and get me a list with all square brackets removed so that I can iterate through the list and get myself the output I want.
Given a string s, the code I have is below. But this does not work in all scenarios. This however works when there are no nested square brackets.

import re
s = "abc3[cd]xyz"
s_final = ""
res = re.findall(r'\d+|[^\W\d_]+', s)
print(res)
for i in range(len(res)):
    if res[i].isnumeric():
        s_final+=int(res[i])*res[i+1]
    else:
        if res[i] not in s_final:
            s_final += res[i]
print(s_final)

if I change the string to

s = "3[a2[c]]" the expected output is accaccacc as the inner square brackets gets evaluated first and then the outer square brackets. But what I get is aaacc which is not wrong based on the code I have. But anyone have any insights on how I can evaluate the string correctly?

Thanks!

Upvotes: 1

Views: 291

Answers (3)

PaulMcG
PaulMcG

Reputation: 63749

Regardless of what parsing tools are used, I always recommend that developers start by writing a BNF (Backus-Naur Form) for their parser, as a roadmap for the development to follow. Here is what I've gathered from your problem description (where '+' means "one or more"):

string_expr := (letter | repetition)+
repetition := integer '[' string_expr ']'
letter := character a-z
integer := (character 0-9)+

At this point, walk through the BNF with your test samples, and make sure things look correct.

You can see that this is a recursive grammar, since string_expr uses repetition in its definition, but repetition uses string_expr.

Pyparsing is written to make it easy to map these BNF definitions to pyparsing objects. Follow this annotated code: (need to work bottom-up in implementing your grammar)

import pyparsing as pp

# define []'s, but we'll use pyparsing's Suppress since they won't be
# needed in the actual expression after parsing is done
LBRACK, RBRACK = map(pp.Suppress, "[]")

# a letter will be a single character from the list of lowercase letters
# defined in the Python string module
letter = pp.Char(string.ascii_lowercase)

# an integer will be one or more numeric digits - attach a parse
# action to convert parsed numeric strings into actual Python ints
integer = pp.Word("0123456789").addParseAction(lambda t: int(t[0]))

# we would like to write 'repetition' now, but we haven't defined
# 'string_expr' yet. We can't define that yet because 'repetition' isn't
# defined yet. So let's define 'string_expr' using a Pyparsing 
# placeholder - a Forward
string_expr = pp.Forward()
repetition = (integer("multiplier") 
              + LBRACK 
              + string_expr("content") 
              + RBRACK)

# now that repetition is defined, we can "insert" the definition of 
# 'string_expr' into the placeholder we created earlier
string_expr <<= (letter | repetition)[...]

# two more parse actions needed:
# - one to do the multiplication of multiplier*content in repetition
# - one to join all the pieces together in string_expr
repetition.addParseAction(lambda t: t.multiplier * list(t.content))
string_expr.addParseAction("".join)

Now test it out using the runTests() method pyparsing defines on parser elements:

string_expr.runTests("""\
    abc3[cd]xyz
    3[a2[c]]
    3[a2[cb]]
    """)

Gives:

abc3[cd]xyz
['abccdcdcdxyz']

3[a2[c]]
['accaccacc']

3[a2[cb]]
['acbcbacbcbacbcb']

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71461

You can use recursion to build the full string, after first parsing the characters:

import re
s = "3[a2[c]]"
def build_string(d):
  while (n:=next(d, None)) is not None and n != ']':
     yield n if not (j:=re.findall('\d+$', n)) else n[:-1*len(j[0])]
     if j:
        _ = next(d)
        yield ''.join(build_string(d))*int(j[0])

def get_result(s):
   return ''.join(build_string(iter(re.findall('\w+|\[|\]', s))))
      
print(get_result(s))

Output:

'accaccacc'

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

import re
s = "3[a2[c]]"
s_final = s
rx = r'(\d+)\[([^][]*)]'
n = 1 
while n:
    s_final, n = re.subn(rx, lambda x: x.group(2) * int(x.group(1)), s_final)

print(s_final) # => accaccacc

See the Python demo.

The (\d+)\[([^][]*)] regex matches and captures into Group 1 any one or more digits, and then matches a [ char, captures into Group 2 any text other than [ and ] chars up to the closest ] char.

The n = 1 is a flag that makes re.subn run at least once.

Then, the while block is executed to find all the occurrences of the pattern and replace with Group 1 value repeated Group 2 times.

Upvotes: 0

Related Questions