Bridgey
Bridgey

Reputation: 539

Why does python regex seem to fail to match beyond 112 bytes?

I have a file, what.dmp, which is 116 bytes long. And my python code looks like this:

import binascii
import re
import sys

print(sys.version)

needle = re.compile(b".{112}")

with open("what.dmp", "rb") as haystack:
  chunk = haystack.read()
  print("Read {0} bytes.".format(len(chunk)))
  matches = needle.search(chunk)
  if matches:
    print(matches.start())
    print(binascii.hexlify(matches.group(0)))
  else:
    print("No matches found.")

Running this code is fine:

C:\test>C:\Python33\python.exe test.py
3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)]
Read 116 bytes.
0
b'0101060001010600087e88758f4e8e75534589751df7897583548775e4bcf001e6d0f001cae3f001ccf7f0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000090d91300000000002c003100eb6fb024'

However, change the regex from 112 to 113:

needle = re.compile(b".{113}")

And no match is found:

C:\test>C:\Python33\python.exe test.py
3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)]
Read 116 bytes.
No matches found.

So the question is: why does the regex not match the 113th character. I haven't posted what.dmp because surely the contents are irrelevant?!

Many thanks!

Upvotes: 0

Views: 78

Answers (1)

Andrew Clark
Andrew Clark

Reputation: 208545

There is a very good chance that byte 113 is equivalent to \n, (10 in binary, 0a in hex). Try adding the re.DOTALL flag to your regex.

However as noted in comments, you probably don't need regular expressions for this.

Upvotes: 2

Related Questions