vtable
vtable

Reputation: 334

Find all words in binary buffer using Python

I want to find in binary buffer (bytes) all the "words" build from ascii lowercase and digits that only 5 chars length.

For example:

bytes(b'a\x1109ertx01\x03a54bb\x05') contains a54bb and 09ert .

Note the string abcdef121212 is larger than 5 chars so I don't want it

I have build that set

set([ord(i) for i in string.ascii_lowercase + string.digits])

What is the fastest way to do that using Python?

Upvotes: 2

Views: 140

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95993

My instinct would be to just go with regex here:

>>> import re
>>> buffer = b'a\x1109ertx01\x03a54bb\x05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']

EDIT:

After your clarification, I would try just doing:

re.findall(b"[a-zA-Z0-9]+", buffer)

And then filtering for bytes of exactly length 5, so:

[x for x in re.findall(b"[a-zA-Z0-9]+", buffer) if len(x) == 5]

Upvotes: 2

Related Questions