radensb
radensb

Reputation: 694

Python Regular expressions for alpha numeric digits

I am learning regular expressions and trying to do the following:

Below is the format of a series of alpha numeric digits. It starts with 4 digits followed by 2 spaces followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by 8 digits followed by a single space followed by an OR bar.

FFFF  FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF |

or written more clearly:

FFFF[space][space]FFFFFFFF[space]FFFFFFFF[space]FFFFFFFF[space]FFFFFFFF[space]|

I first need to find the correct line in a file which starts with 00A3. Then I need to pull out the digit groups which I have framed with the using:

p = re.compile('00A3  ') # search for 00A3[space][space]
r = re.compile(???)      # search for desired 8 digit groups
q = re.compile(\[SECTION2\])

dataString = inFile.next() # 00A3 388A63FF 00000DF1 52984731 FF989ACB
while not q.match(dataString) # look for [SECTION2] line in file. This means we passed where 00A3 would have been so it must not be here.
    if p.match(dataString):
        numbers = r.findall(dataString) # numbers = [388A63FF, 00000DF1, 52984731, FF989ACB]
        break
    dataString = inFile.next() # get next line to check

This should give me a list of the numbers for further processing. Im just not sure how to write the regex that will find only the 4 groups of 8 alpha numeric digits seperated with a space. My thought was to look for 8 alpha numeric digits together with a space in front and a space behind, but would that cause a problem and how would that look?

I looked into the look ahead and look behind options, but i get confused.

I am still very new to this, especially in Python so I am open to suggestions on better implementation.

Thanks!

Upvotes: 3

Views: 597

Answers (4)

xgord
xgord

Reputation: 4776

You could use one regular expression for a single 8 digit group and then find all the matches in a line.

line = #string

regex = re.compile(r' (\w{8})')

groups = regex.findall(line) #gives a list of the matches in order of appearance

Upvotes: 2

snf
snf

Reputation: 3077

If you are using findall, you should be ok with

\w{8}

It matches all the the hex numbers that are 8 digits long.

Upvotes: 1

seth
seth

Reputation: 1788

re: differing implementations

all_numbers =[]
with open('input','r') as my_file:
    for line in my_file:
        line = line.split()
        if line[0] == "00A3":
            numbers = line[1:5]
            all_numbers.append(numbers)

numbers looks like ['388A63FF', '00000DF1', '52984731', 'FF989ACB'] and all_numbers is just a list of the numbers found.

Upvotes: 0

arshajii
arshajii

Reputation: 129477

You can indeed use lookarounds:

(?<=\d{4}\s{2})(\d{8}\s){4}(?=[\s|])

Upvotes: 1

Related Questions