Reputation: 91
I am using RegEx to match BGP messages in a byte string. An example byte string is looking like this:
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x13\x04\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00\x13\x04'
\xff (8 times) is used as "magic marker" to start a single message. Now I want to split the messages to parse each of them.
messages = re.split(b'\xff{8}', payload)
Matching works fine but I got some empty fields in my messages array.
b''
b''
b'001304'
b''
b''
b'001304'
Can someone explain this behavior? Why are there two empty fields between each (correct splitted) message. In larger byte strings sometimes there is just one empty field between each messages.
Upvotes: 3
Views: 1395
Reputation: 627056
I think you want to match 8 occurrences of \xff
, not just 8 trailing f
s (e.g. \xfffffffff
):
messages = re.split(b'(?:\xff){8}', payload)
^^^ ^
Also, there are just more than one 8 consecutive \xff
s in your string on end. You might want to use
messages = re.split(b'(?:(?:\xff){8})+', payload)
However, that will still result in having an empty first element if the match is found at the start of the data. You may remove the part at the beginning before splitting:
messages = re.split(b'(?:(?:\xff){8})+', re.sub(b'^(?:(?:\xff){8})+', b'', payload))
HOWEVER, the best idea is to just remove the empty elements with a list comprehension or with Filter
(kudos for testing goes to you):
messages = [x for x in re.split(b'(?:\xff){8}', payload) if x]
# Or, the fastest way here as per the comments
messages = list(filter(None, messages))
See an updated Python 3 demo
Upvotes: 2