Vitali Kotik
Vitali Kotik

Reputation: 753

Python: Detect all strings in binary file?

strings is a GNU/Linux app that prints the strings of printable characters in files.

Is there any way to do what strings does but in Python?

Calling strings and grabbing the output is not an option in my case.

Upvotes: 1

Views: 3999

Answers (4)

Martin Evans
Martin Evans

Reputation: 46779

The following would print a list of all words of length 4 or more:

import re

with open(r"my_binary_file", "rb") as f_binary:
    print re.findall("([a-zA-Z]{4,})", f_binary.read())

By doing this, it cuts down on some non-text matches but might of course miss something you were looking for. strings also has a default value of 4.

Upvotes: 2

Cthulhu
Cthulhu

Reputation: 1372

The following should find all strings of length 4 and more (which is what strings does by default) in the bytes array:

def strings(data):
    cleansed = "".join(map(lambda byte: byte if byte >= chr(0x20) and byte <= chr(0x7F) else chr(0), data))
    return filter(lambda string: len(string) >= 4, cleansed.split(chr(0)))

Upvotes: 0

Jason Hu
Jason Hu

Reputation: 6333

if you don't care about the content of the output, it's very easy to achieve if you simple ignore all decoding error:

in python2:

with open('file') as fd:
    print fd.read().decode('ascii', errors='ignore')

in python3:

import codecs
with open('file') as fd:
    print(codecs.decode(fd.read(), 'ascii', errors='ignore'))

in any ways, errors='ignore' just ignore all errors during decoding.

further reference: https://docs.python.org/2/library/codecs.html

python3: https://docs.python.org/3.5/library/codecs.html

Upvotes: 2

drum
drum

Reputation: 5651

Check byte by byte to see if it falls between 0x20 and 0x7F. That should print out if the byte is a readable ASCII character.

Upvotes: 1

Related Questions