Reputation: 73
I'm New to python and regex. Here I'm trying to recover the text between two limits. The starting could be mov/add/rd/sub/and/etc.. and end limit is end of the line.
/********** sample input text file *************/
f0004030: a0 10 20 02 mov %l0, %psr
//some unwanted lines
f0004034: 90 04 20 03 add %l0, 3, %o0
f0004038: 93 48 00 00 rd %psr, %o1
f000403c: a0 10 3f fe sub %o5, %l0, %g1
/*-------- Here is the code -----------/
try:
objdump = open(dest+name,"r")
except IOError:
print "Error: '" + name + "' not found in " + dest
sys.exit()
objdump_file = objdump.readlines()
for objdump_line in objdump_file:
a = ['add', 'mov','sub','rd', 'and']
if any(x in objdump_line for x in a) # To avoid unwanted lines
>>>>>>>>>> Here is the problem >>>>>>>>>>>>>
m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)
<<<<<<<<<<< Here is the problem <<<<<<<<<<<<<
print m
/*---------- Result I'm getting --------------*/
[('mov', ' %l0, %psr', '')]
[('add', ' %l0, 3, %o0', '')]
[('rd', ' %psr, %o1', '')]
[('sub', ' %o5, %l0, %g1', '')]
/*----------- Expected result ----------------*/
[' %l0, %psr']
[' %l0, 3, %o0']
[' %psr, %o1']
[' %o5, %l0, %g1']
I have no Idea why that parentheses and unwanted quotes are coming !!. Thanks in advance.
Upvotes: 1
Views: 91
Reputation: 11032
Quoting from python documentation from here about findall
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
The parenthesis represents one group or list that is found and it contains another list which contains all captured groups. There can be multiple groups that can be found. You can access it as
re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][1]
0 represents the first group and 1 represents first element of the list of that group as you do not want any other element
The capturing group tries to capture the expression matched between the parenthesis. But for the last capturing group there is no text. So you are getting an empty ''
As you mentioned in your comment about using this
add(.*?)$
Instead of try this
(add)(.*?)$
The ()
indicates capturing group and you will get the result as expected
Upvotes: 1
Reputation: 4837
if you use grouping in findall, it's going to return all captured groups, if you want some specific parts use slicing:
m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][-2:-1]
Additionally you can solve your problem without regex, you already checking if string has any of those ['add', 'mov','sub','rd', 'and']
, so you can split the string and pick two last elemnts:
m = ' '.join(objdump_line.split()[-2:])
Upvotes: 1