Reputation: 483
Previously, I was able to find binary patterns in files using grep with
grep -a -b -o -P '\x01\x02\x03' <file>
By find I mean I was able to get the byte position of the pattern in the file. But when I tried doing this with the latest version of grep (v2.16) it no longer worked.
Specifically, I can manually verify that the pattern is present in the file but grep does not find it. Strangely, some patterns are found correctly but not others. For example, in a test file
000102030405060708090a0b0c0e0f
'\x01\x02'
is found but not '\x07\x08'
.
Any help in clarifying this behavior is highly appreciated.
Update: The above example does not show the described behavior. Here are the commands that exhibit the problem
printf `for (( x=0; x<256; x++ )); do printf "\x5cx%02x" $x; done` > test
for (( x=$((0x70)); x<$((0x8f)); x++ )); do
p=`printf "\'\x5cx%02x\x5cx%02x\'" $x $((x+1))`
echo -n $p
echo $p test | xargs grep -c -a -o -b -P | cut -d: -f1
done
The first line creates a file with all possible bytes from 0x00 to 0xff in a sequence. The second line counts the number of occurrences of pairs of consecutive byte values in the range 0x70 to 0x8f. The output I get is
'\x70\x71'1
'\x71\x72'1
'\x72\x73'1
'\x73\x74'1
'\x74\x75'1
'\x75\x76'1
'\x76\x77'1
'\x77\x78'1
'\x78\x79'1
'\x79\x7a'1
'\x7a\x7b'1
'\x7b\x7c'1
'\x7c\x7d'1
'\x7d\x7e'1
'\x7e\x7f'1
'\x7f\x80'0
'\x80\x81'0
'\x81\x82'0
'\x82\x83'0
'\x83\x84'0
'\x84\x85'0
'\x85\x86'0
'\x86\x87'0
'\x87\x88'0
'\x88\x89'0
'\x89\x8a'0
'\x8a\x8b'0
'\x8b\x8c'0
'\x8c\x8d'0
'\x8d\x8e'0
'\x8e\x8f'0
Update: The same pattern occurs for single-byte patterns -- no bytes with value greater than 0x7f are found.
Upvotes: 3
Views: 3796
Reputation: 1698
The results may depend on you current locale. To avoid this, use:
env LANG=LC_ALL grep -P "<binary pattern>" <file>
where env LANG=LC_ALL
overrides your current locale to allow byte matching. Otherwise, patterns with non-ASCII "characters" such as \xff
will not match.
For example, this fails to match because (at least in my case) the environment has LANG=en_US.UTF-8
:
$ printf '\x41\xfe\n' | grep -P '\xfe'
when this succeeds:
$ printf '\x41\xfe\n' | env LANG=LC_ALL grep -P '\xfe'
A?
Upvotes: 1