bnx
bnx

Reputation: 427

Misunderstanding Python bytes with files and regex

I'm trying to modify byte data, my files are read using the b modifier.

Sometimes the search will contain a wildcard, so I am using re - it supports byte patterns.

For example

re.finditer(b'\x00\x01\x00\x03\x7c', file)

My problem is that \x7c is interpereted as the | operator in regex.

In all my other cases the byte search works fine, how do I stop re parsing this? Am I misunderstanding re or byte data?

Thanks!

> python --version
Python 3.8.1

Upvotes: 2

Views: 66

Answers (1)

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

Inserting an escape character into the regex (the backslash \x5c) to escape the pipe should work:

re.finditer(b'\x00\x01\x00\x03\x5c\x7c', file)

Worth noting is that you can input string/character literals in bytestrings and they work fine, which makes it slightly more comprehensible:

re.finditer(b'\x00\x01\x00\x03\|', file)

Or you could use re.escape() first, which is a more extensible and general-purpose solution:

re.finditer(re.escape(b'\x00\x01\x00\x03\x7c'), file)

Upvotes: 3

Related Questions