Jack O'Leary
Jack O'Leary

Reputation: 235

Recursively find hexadecimal bytes in binary files

I'm using grep within bash shell to find a series of hexadecimal bytes in files:

$ find . -type f -exec grep -ri "\x5B\x27\x21\x3D\xE9" {} \;

The search works fine, although I know there's a limitation for matches when not using the -a option where results only return:

Binary file ./file_with_bytes matches

I would like to get the offset of the matching result, is this possible? I'm open to using another similar tool I'm just not sure what it would be.

Upvotes: 1

Views: 5245

Answers (3)

After spending far more time than I would have liked trying to get macOS's grep (and various other tools like it, including GNU grep) to output binary offsets like this (and other answers) claim is possible, I stumbled across radare2's rafind2 command, which did exactly what I needed:

rafind2 – advanced command-line byte pattern search in files

We can install radare2 on macOS using Homebrew:

brew install radare2

While rafind2 can't be used to search multiple files recursively, it's quite powerful, and works well for finding offsets in a single file.

We can use the -x flag to search for a hex string:

-x [hex] search for hexpair string (909090) (can be used multiple times)

⇒ rafind2 -x "5B27213DE9" /path/to/the-bin
0xd0fdf

We can even use wildcard placeholders (either full bytes, or nibbles) by using . within the hex pattern passed with -x:

⇒ rafind2 -x "5B..213.E9" /path/to/the-bin
0xd0fdf

We know grep can do recursive (-r) matching within binary files:

⇒ grep -r '\x5B\x27\x21\x3D\xE9' ./path/to/bins
Binary file ./path/to/bins/aaa matches
Binary file ./path/to/bins/bbb matches
Binary file ./path/to/bins/ccc matches

By using -rl we can have grep recursively search, and output the matching filenames:

⇒ grep -rl '\x5B\x27\x21\x3D\xE9' ./path/to/bins
./path/to/bins/aaa
./path/to/bins/bbb
./path/to/bins/ccc

Which we can then combine with rafind2 to extract all of the offsets:

SEARCH_DIRECTORY="./path/to/bins"
GREP_PATTERN='\x5B\x27\x21\x3D\xE9'

# Remove all instances of '\x' from PATTERN for rafind2
# Eg. Becomes 5B27213DE9
PATTERN="${GREP_PATTERN//\\x/}"

grep -rl "$GREP_PATTERN" "$SEARCH_DIRECTORY" | while read -r file; do
  echo "$file:"
  rafind2 -x "$PATTERN" "$file"
done

Which results in output like this:

./path/to/bins/aaa:
0x4b0060
./path/to/bins/bbb:
0x4b0060
./path/to/bins/ccc:
0x4bd1e0

We could also skip grep, and just use rafind2 with find / fd directly:

SEARCH_DIRECTORY="./path/to/bins"
PATTERN='5B27213DE9'

# Using find
find "$SEARCH_DIRECTORY" -type f -exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && echo "$2:" && echo "$output"' sh "$PATTERN" {} \;

# Using fd
fd --type f --exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && (echo "$2:"; echo "$output")' sh "$PATTERN" {} "$SEARCH_DIRECTORY"

Both of which would output like this:

./path/to/bins/aaa:
0x4b0060
./path/to/bins/bbb:
0x4b0060
./path/to/bins/ccc:
0x4bd1e0

As a rough idea of how performant each of these methods is, see the following output from a single run of each with time:

⇒ time ./test-grep-and-rafind2
# ..snip..
./test-grep-and-rafind2  7.33s user 0.19s system 99% cpu 7.578 total

⇒ time ./test-find-and-rafind2
# ..snip..
./test-find-and-rafind2  3.24s user 0.72s system 98% cpu 4.041 total

⇒ time ./test-fd-and-rafind2
# ..snip..
./test-fd-and-rafind2  3.85s user 1.04s system 488% cpu 1.002 total

Upvotes: 1

etopylight
etopylight

Reputation: 1319

There is actually an option in grep that is available to use

-b --byte-offset  Print the 0-based byte offset within the input file

A simple example using this option:

$ grep -obarUP "\x01\x02\x03" /bin

prints out both the filename and byte offset of the matched pattern inside a directory

/bin/bash:772067:
/bin/bash:772099:
/bin/bash:772133:
/bin/bash:772608:
/bin/date:56160:

notice that find is actually not needed since the option -r has already taken care of the recursive file searching

Upvotes: 5

Mark Setchell
Mark Setchell

Reputation: 207445

Not at a computer, but use:

od -x yourFile

or

xxd yourFile

to get it dumped in hex with offsets on the left side.

Sometimes your search string may not be found because the characters do not appear contiguously but are split across two lines. You can pass the file through twice though, with the first 4 bytes chopped off the second time to make sure your string is found intact on one pass or the other. Then add the offest back on and sort and uniq the offsets.

Upvotes: 1

Related Questions