suffa
suffa

Reputation: 3796

Creating a regular expression that will search for strings converted to 12 digit hex values

I'm trying to create a regexp that will search for all 12 digit hex values in a file. For example: string 448699 => hex 343438363939. The regexp that I have now is:

(r'3[0-9]\d{10}')

it matches the first character to 3 the second 0-9 and the next 10 are any random digits. The hex above is 12 digits and starting with the first character, every other character is a 3. How would I express this with regexp. I was thinking along the lines of the following, but not sure:

(r'3[0-9]3[0-9]3[0-9]3[0-9]3[0-9]3[0-9]')

Upvotes: 0

Views: 514

Answers (3)

John Machin
John Machin

Reputation: 82934

Confusion reigns supreme ...

If you want to match the result of converting ANY 6-byte sequence to a 12-hex-digit string (what you asked for), you need [0-9a-fA-F]{12}.

If you want to match the result of converting a 6-byte sequence of ASCII decimal digits to a 12-hex-digit string (what your sample code indicates), you need (?:3[0-9]){6}.

Nitpick: You should NOT use \d as if you are using Unicode it will pick up any non-ASCII decimal digits, which are NOT hex digits (over 300 possibilities).

The suggestion (3[0-9a-fA-F]){6} detects 6-byte sequences of bytes drawn from 0123456789:;<=>? which is unlikely to be what you want.

Update with request for clarification.

Please consider the following and let us know which pattern is actually finding what you want it to find, and is not letting "false positives" through the gate.

>>> import re, binascii
>>> originals = ('123456', 'FOOBAR', ':;<=>?')
>>> data = ' '.join(map(binascii.hexlify, originals))
>>> print data
313233343536 464f4f424152 3a3b3c3d3e3f
>>> for pattern in (r'(?:3\d){6}', r'(3[0-9a-fA-F]){6}',
...             r'(?:3[0-9a-fA-F]){6}', r'[0-9a-fA-F]{12}'):
...     print repr(pattern), re.findall(pattern, data)
...
'(?:3\\d){6}' ['313233343536']
'(3[0-9a-fA-F]){6}' ['36', '3f']
'(?:3[0-9a-fA-F]){6}' ['313233343536', '3a3b3c3d3e3f']
'[0-9a-fA-F]{12}' ['313233343536', '464f4f424152', '3a3b3c3d3e3f']

Upvotes: 2

Justin Morgan
Justin Morgan

Reputation: 30760

You're really close. The pattern you want is:

(r'(?:3\d){6}')

Edit - As Mike Pennington pointed out, hex numbers can include letters A-F. I wasn't actually sure of the purpose of the "every other digit is 3" rule, so I left the rules as you described them.

Upvotes: 3

Mike Pennington
Mike Pennington

Reputation: 43077

You're almost there... since hex can have a-f you need to include those...

(r'(3[0-9a-fA-F]){6}')

Upvotes: 1

Related Questions