Reputation: 9915
I'm going through a binary file with regexes extracting data, and I'm having a problem with regex I can't track down.
This is the code I'm having issues with:
z = 0
for char in string:
self.response.out.write('|%s' % char.encode('hex'))
z+=1
if z > 20:
self.response.out.write('<br>')
break
title = []
string = re.sub('^\x72.([^\x7A]+)', lambda match: append_match(match, title), string, 1)
print_info('Title', title)
def append_match(match, collection, replace = ''):
collection.append(match.group(1))
return replace
This is the content of the first 20 chars in string when this runs:
|72|0a|50|79|72|65|20|54|72|6f|6c|6c|7a|19|54|72|6f|6c|6c|62|6c
It returns nothing, except if I remove the ^, in which case it returns "Troll" (not the quotes) which is 54726F6C6C. It should be returning everything up to the \x7a as I read it.
What's going on here?
Upvotes: 0
Views: 834
Reputation: 214949
The problem is that \x0A
(=newline) won't be matched by the dot by default. Try adding the dotall flag to your pattern, for example:
re.sub('(?s)^\x72.([^\x7A]+)....
Upvotes: 2