Ahmed4end
Ahmed4end

Reputation: 324

greedy and non-greedy regex aren't enough at all

I have a problem matching some stuff out of a string the problem is with ( re.findall() ) and it only allows me to match greedy or non-greedy and I want to match the things between greedy and non-greedy for example:

import re
text = "f(s(5)+5)+f(12)"
regex = re.findall("f\(.*\)", text)

>>>['f(s(5)+5)+f(12)']

this is greedy and will match the whole string. another example:

import re
text = "f(s(5)+5)+f(12)"
regex = re.findall("f\(.*?\)", text)

>>>['f(s(5)', 'f(12)']

this is non-greedy and will match some parts but not enough i want to match all greedy and non-greedy and the matches between them like

>>> ['f(s(5)', 'f(s(5)+5)', 'f(12), 'f(s(5)+5)+f(12)']

see there is one match missing from the non-greedy and greedy ones it is 'f(s(5)+5)' and it would be more than one missing if the string is larger.

Upvotes: 0

Views: 124

Answers (1)

dhanlin
dhanlin

Reputation: 145

Yeah like everyone already told, there is no direct regex that would give you the desired output.

But with a loop on regex, i was able to achieve your desired output. See if it helps.

import re
text = "f(s(5)+5)+f(12)"
print ("occurences of ')' : {}".format(text.count(")")))

test_str = text
# loop repeatedly until all substrings starting with 'f(' are parsed
while test_str:
    # for loop: to parse all ')'
    for i in range(1,test_str.count(")")+1):
        # regex explanation can be found @ https://regex101.com/r/jJOXr0/1/
        regex = r'^f\((?:.*?\)){' + re.escape(str(i)) + r'}'
        output_list = re.findall(regex, test_str)
        print(output_list[0])

    # find the next substring starting with 'f('
    substr_id = test_str.find('f(',1)
    if substr_id == -1:
        break
    else:
        test_str = test_str[substr_id:]


Output :
occurences of ')' : 3
f(s(5)
f(s(5)+5)
f(s(5)+5)+f(12)
f(12)

Upvotes: 1

Related Questions