Reputation: 235
I'm using grep within bash shell to find a series of hexadecimal bytes in files:
$ find . -type f -exec grep -ri "\x5B\x27\x21\x3D\xE9" {} \;
The search works fine, although I know there's a limitation for matches when not using the -a
option where results only return:
Binary file ./file_with_bytes matches
I would like to get the offset of the matching result, is this possible? I'm open to using another similar tool I'm just not sure what it would be.
Upvotes: 1
Views: 5245
Reputation: 2388
After spending far more time than I would have liked trying to get macOS's grep
(and various other tools like it, including GNU grep
) to output binary offsets like this (and other answers) claim is possible, I stumbled across radare2
's rafind2
command, which did exactly what I needed:
rafind2
– advanced command-line byte pattern search in files
We can install radare2 on macOS using Homebrew:
brew install radare2
While rafind2
can't be used to search multiple files recursively, it's quite powerful, and works well for finding offsets in a single file.
We can use the -x
flag to search for a hex string:
-x [hex]
search for hexpair string (909090
) (can be used multiple times)
⇒ rafind2 -x "5B27213DE9" /path/to/the-bin
0xd0fdf
We can even use wildcard placeholders (either full bytes, or nibbles) by using .
within the hex pattern passed with -x
:
⇒ rafind2 -x "5B..213.E9" /path/to/the-bin
0xd0fdf
We know grep
can do recursive (-r
) matching within binary files:
⇒ grep -r '\x5B\x27\x21\x3D\xE9' ./path/to/bins
Binary file ./path/to/bins/aaa matches
Binary file ./path/to/bins/bbb matches
Binary file ./path/to/bins/ccc matches
By using -rl
we can have grep
recursively search, and output the matching filenames:
⇒ grep -rl '\x5B\x27\x21\x3D\xE9' ./path/to/bins
./path/to/bins/aaa
./path/to/bins/bbb
./path/to/bins/ccc
Which we can then combine with rafind2
to extract all of the offsets:
SEARCH_DIRECTORY="./path/to/bins"
GREP_PATTERN='\x5B\x27\x21\x3D\xE9'
# Remove all instances of '\x' from PATTERN for rafind2
# Eg. Becomes 5B27213DE9
PATTERN="${GREP_PATTERN//\\x/}"
grep -rl "$GREP_PATTERN" "$SEARCH_DIRECTORY" | while read -r file; do
echo "$file:"
rafind2 -x "$PATTERN" "$file"
done
Which results in output like this:
./path/to/bins/aaa:
0x4b0060
./path/to/bins/bbb:
0x4b0060
./path/to/bins/ccc:
0x4bd1e0
We could also skip grep
, and just use rafind2
with find
/ fd
directly:
SEARCH_DIRECTORY="./path/to/bins"
PATTERN='5B27213DE9'
# Using find
find "$SEARCH_DIRECTORY" -type f -exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && echo "$2:" && echo "$output"' sh "$PATTERN" {} \;
# Using fd
fd --type f --exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && (echo "$2:"; echo "$output")' sh "$PATTERN" {} "$SEARCH_DIRECTORY"
Both of which would output like this:
./path/to/bins/aaa:
0x4b0060
./path/to/bins/bbb:
0x4b0060
./path/to/bins/ccc:
0x4bd1e0
As a rough idea of how performant each of these methods is, see the following output from a single run of each with time
:
⇒ time ./test-grep-and-rafind2
# ..snip..
./test-grep-and-rafind2 7.33s user 0.19s system 99% cpu 7.578 total
⇒ time ./test-find-and-rafind2
# ..snip..
./test-find-and-rafind2 3.24s user 0.72s system 98% cpu 4.041 total
⇒ time ./test-fd-and-rafind2
# ..snip..
./test-fd-and-rafind2 3.85s user 1.04s system 488% cpu 1.002 total
Upvotes: 1
Reputation: 1319
There is actually an option in grep that is available to use
-b --byte-offset Print the 0-based byte offset within the input file
A simple example using this option:
$ grep -obarUP "\x01\x02\x03" /bin
prints out both the filename and byte offset of the matched pattern inside a directory
/bin/bash:772067:
/bin/bash:772099:
/bin/bash:772133:
/bin/bash:772608:
/bin/date:56160:
notice that find
is actually not needed since the option -r
has already taken care of the recursive file searching
Upvotes: 5
Reputation: 207445
Not at a computer, but use:
od -x yourFile
or
xxd yourFile
to get it dumped in hex with offsets on the left side.
Sometimes your search string may not be found because the characters do not appear contiguously but are split across two lines. You can pass the file through twice though, with the first 4 bytes chopped off the second time to make sure your string is found intact on one pass or the other. Then add the offest back on and sort and uniq the offsets.
Upvotes: 1