Reputation: 13
I am parsing a string that contains file magic numbers but the formatting is inconsistent. Some of the patterns are in Hex with the format '\x0a'(Where the string holds an escaped char so I apparently need to search for both \'s), others are the direct ASCII characters and the rest are somewhere in between.
I was hoping to make a Regular Expression to find the characters in a string that are not already Hex. I attempted the following search for Hex values with the inversion flag.
(?!\\\\x[0-9 a-f]{2})
This did not work as intended as it sees the x in the next character after the full match and matches to that.
>>> test = "\\x50K\\x03\\x04"
>>> re.search("(?!\\\\x[0-9 a-f]{2})" test)
<re.Match object; span(1, 1), match=''>
Without getting the positive results and inverting them myself I am not sure how to proceed.
Thanks!
Upvotes: 0
Views: 55
Reputation: 593
You can replace hex values with nothing like this:re.sub(r'\\x[0-9 a-f]{2}','', your_line)
and use what remains -- non-hex characters
Upvotes: 1