Joren
Joren

Reputation: 9915

Python Regex not matching at start of string?

I'm going through a binary file with regexes extracting data, and I'm having a problem with regex I can't track down.

This is the code I'm having issues with:

        z = 0
        for char in string:
            self.response.out.write('|%s' % char.encode('hex'))
            z+=1
            if z > 20:
                self.response.out.write('<br>')
                break

        title = []
        string = re.sub('^\x72.([^\x7A]+)', lambda match: append_match(match, title), string, 1)
        print_info('Title', title)

def append_match(match, collection, replace = ''):
    collection.append(match.group(1))
    return replace

This is the content of the first 20 chars in string when this runs:

|72|0a|50|79|72|65|20|54|72|6f|6c|6c|7a|19|54|72|6f|6c|6c|62|6c

It returns nothing, except if I remove the ^, in which case it returns "Troll" (not the quotes) which is 54726F6C6C. It should be returning everything up to the \x7a as I read it.

What's going on here?

Upvotes: 0

Views: 834

Answers (1)

georg
georg

Reputation: 214949

The problem is that \x0A (=newline) won't be matched by the dot by default. Try adding the dotall flag to your pattern, for example:

re.sub('(?s)^\x72.([^\x7A]+)....

Upvotes: 2

Related Questions