Reputation: 694
I am learning regular expressions and trying to do the following:
Below is the format of a series of alpha numeric digits. It starts with 4 digits followed by 2 spaces followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by an OR bar.
FFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |
or written more clearly:
FFFF[space][space]FFFFFFFF[space]FFFFFFFF[space]FFFFFFFF[space]FFFFFFFF[space]|
I first need to find the correct line in a file which starts with 00A3. Then I need to pull out the digit groups which I have framed with the using:
p = re.compile('00A3 ') # search for 00A3[space][space]
r = re.compile(???) # search for desired 8 digit groups
q = re.compile(\[SECTION2\])
dataString = inFile.next() # 00A3 388A63FF 00000DF1 52984731 FF989ACB
while not q.match(dataString) # look for [SECTION2] line in file. This means we passed where 00A3 would have been so it must not be here.
if p.match(dataString):
numbers = r.findall(dataString) # numbers = [388A63FF, 00000DF1, 52984731, FF989ACB]
break
dataString = inFile.next() # get next line to check
This should give me a list of the numbers for further processing. Im just not sure how to write the regex that will find only the 4 groups of 8 alpha numeric digits seperated with a space. My thought was to look for 8 alpha numeric digits together with a space in front and a space behind, but would that cause a problem and how would that look?
I looked into the look ahead and look behind options, but i get confused.
I am still very new to this, especially in Python so I am open to suggestions on better implementation.
Thanks!
Upvotes: 3
Views: 597
Reputation: 4776
You could use one regular expression for a single 8 digit group and then find all the matches in a line.
line = #string
regex = re.compile(r' (\w{8})')
groups = regex.findall(line) #gives a list of the matches in order of appearance
Upvotes: 2
Reputation: 3077
If you are using findall, you should be ok with
\w{8}
It matches all the the hex numbers that are 8 digits long.
Upvotes: 1
Reputation: 1788
re: differing implementations
all_numbers =[]
with open('input','r') as my_file:
for line in my_file:
line = line.split()
if line[0] == "00A3":
numbers = line[1:5]
all_numbers.append(numbers)
numbers
looks like ['388A63FF', '00000DF1', '52984731', 'FF989ACB']
and all_numbers
is just a list of the numbers found.
Upvotes: 0
Reputation: 129477
You can indeed use lookarounds:
(?<=\d{4}\s{2})(\d{8}\s){4}(?=[\s|])
Upvotes: 1