Ferenc Deak
Ferenc Deak

Reputation: 35408

Count the number of occurences of binary data

I need to count the occurrences of the hex string 0xFF 0x84 0x03 0x07 in a binary file, without too much hassle... is there a quick way of grepping for this data from the linux command line or should I write dedicated code to do it?

Upvotes: 9

Views: 6961

Answers (5)

mwfearnley
mwfearnley

Reputation: 3669

Patterns without linebreaks

If your version of grep takes the -P parameter, then you can use grep -a -P, to search for an arbitrary binary string (with no linebreaks) inside a binary file. This is close to what you want:

grep -a -c -P '\xFF\x84\x03\x07' myfile.bin
  • -a ensures that binary files will not be skipped

  • -c outputs the count

  • -P specifies that your pattern is a Perl-compatible regular expression (PCRE), which allows strings to contain hex characters in the above \xNN format.

Unfortunately, grep -c will only count the number of "lines" the pattern appears on - not actual occurrences.

To get the exact number of occurrences with grep, it seems you need to do:

grep -a -o -P '\xFF\x84\x03\x07' myfile.bin | wc -l

grep -o separates out each match onto its own line, and wc -l counts the lines.

Patterns containing linebreaks

If you do need to grep for linebreaks, one workaround I can think of is to use tr to swap the character for another one that's not in your search term.

# set up test file (0a is newline)
xxd -r <<< '0:08 09 0a 0b 0c 0a 0b 0c' > test.bin

# grep for '\xa\xb\xc' doesn't work
grep -a -o -P '\xa\xb\xc' test.bin | wc -l

# swap newline with oct 42 and grep for that
tr '\n\042' '\042\n' < test.bin | grep -a -o -P '\042\xb\xc' | wc -l

(Note that 042 octal is the double quote " sign in ASCII.)

Another way, if your string doesn't contain Nulls (0x0), would be to use the -z flag, and swap Nulls for linebreaks before passing to wc.

grep -a -o -P -z '\xa\xb\xc' test.bin | tr '\0\n' '\n\0' | wc -l

(Note that -z and -P may be experimental in conjunction with each other. But with simple expressions and no Nulls, I would guess it's fine.)

Upvotes: 8

entheh
entheh

Reputation: 958

This doesn't quite answer your question, but does solve the problem when the search string is ASCII but the file is binary:

cat binaryfile | sed 's/SearchString/SearchString\n/g' | grep -c SearchString

Basically, 'grep' was almost there except it only counted one occurrence if there was no newline byte in between, so I added the newline bytes.

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85785

How about:

$ hexdump a.out | grep -Ec 'ff ?84 ?03 ?07'

Upvotes: 0

hiteshradia
hiteshradia

Reputation: 357

use hexdump like

hexdump -v -e '"0x" 1/1 "%02X" " "' <filename> | grep -oh "0xFF 0x84 0x03 0x07" |wc -w

hexdump will output binary file in the given format like 0xNN

grep will find all the occurrences of the string without considering the same ones repeated on a line

wc will give you final count

Upvotes: 1

Kent
Kent

Reputation: 195049

did you try grep -a?

from grep man page:

-a, --text
              Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

Upvotes: 0

Related Questions