Ashfaque Ahammed
Ashfaque Ahammed

Reputation: 73

Python regex finding sub-string

I'm New to python and regex. Here I'm trying to recover the text between two limits. The starting could be mov/add/rd/sub/and/etc.. and end limit is end of the line.

/********** sample input text file *************/
f0004030:   a0 10 20 02     mov  %l0, %psr
//some unwanted lines
f0004034:   90 04 20 03     add  %l0, 3, %o0
f0004038:   93 48 00 00     rd  %psr, %o1
f000403c:   a0 10 3f fe     sub  %o5, %l0, %g1

/*-------- Here is the code -----------/
    try:
        objdump = open(dest+name,"r")
    except IOError:
        print "Error: '" + name + "' not found in " + dest 
        sys.exit()
    objdump_file = objdump.readlines()
    for objdump_line in objdump_file:
        a = ['add', 'mov','sub','rd', 'and']

        if any(x in objdump_line for x in a)   # To avoid unwanted lines



>>>>>>>>>> Here is the problem >>>>>>>>>>>>> 

            m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)

<<<<<<<<<<< Here is the problem <<<<<<<<<<<<<


           print m

/*---------- Result I'm getting --------------*/
    [('mov', '  %l0, %psr', '')]
    [('add', '  %l0, 3, %o0', '')]
    [('rd', '  %psr, %o1', '')]
    [('sub', '  %o5, %l0, %g1', '')]

/*----------- Expected result ----------------*/
    ['  %l0, %psr']
    ['  %l0, 3, %o0']
    ['  %psr, %o1']
    ['  %o5, %l0, %g1']

I have no Idea why that parentheses and unwanted quotes are coming !!. Thanks in advance.

Upvotes: 1

Views: 91

Answers (2)

rock321987
rock321987

Reputation: 11032

Quoting from python documentation from here about findall

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

The parenthesis represents one group or list that is found and it contains another list which contains all captured groups. There can be multiple groups that can be found. You can access it as

re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][1]
0 represents the first group and 1 represents first element of the list of that group as you do not want any other element

The capturing group tries to capture the expression matched between the parenthesis. But for the last capturing group there is no text. So you are getting an empty ''

As you mentioned in your comment about using this

add(.*?)$

Instead of try this

(add)(.*?)$

The () indicates capturing group and you will get the result as expected

Upvotes: 1

midori
midori

Reputation: 4837

if you use grouping in findall, it's going to return all captured groups, if you want some specific parts use slicing:

m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][-2:-1]

Additionally you can solve your problem without regex, you already checking if string has any of those ['add', 'mov','sub','rd', 'and'], so you can split the string and pick two last elemnts:

m = ' '.join(objdump_line.split()[-2:])

Upvotes: 1

Related Questions