Daniel Genin
Daniel Genin

Reputation: 483

Using grep to find a binary pattern in a file

Previously, I was able to find binary patterns in files using grep with

grep -a -b -o -P '\x01\x02\x03' <file>

By find I mean I was able to get the byte position of the pattern in the file. But when I tried doing this with the latest version of grep (v2.16) it no longer worked.

Specifically, I can manually verify that the pattern is present in the file but grep does not find it. Strangely, some patterns are found correctly but not others. For example, in a test file

000102030405060708090a0b0c0e0f

'\x01\x02' is found but not '\x07\x08'.

Any help in clarifying this behavior is highly appreciated.

Update: The above example does not show the described behavior. Here are the commands that exhibit the problem

printf `for (( x=0; x<256; x++ )); do printf "\x5cx%02x" $x; done` > test

for (( x=$((0x70)); x<$((0x8f)); x++ )); do
    p=`printf "\'\x5cx%02x\x5cx%02x\'" $x  $((x+1))`
    echo -n $p
    echo $p test | xargs grep -c -a -o -b -P | cut -d: -f1
done

The first line creates a file with all possible bytes from 0x00 to 0xff in a sequence. The second line counts the number of occurrences of pairs of consecutive byte values in the range 0x70 to 0x8f. The output I get is

   '\x70\x71'1
   '\x71\x72'1
   '\x72\x73'1
   '\x73\x74'1
   '\x74\x75'1
   '\x75\x76'1
   '\x76\x77'1
   '\x77\x78'1
   '\x78\x79'1
   '\x79\x7a'1
   '\x7a\x7b'1
   '\x7b\x7c'1
   '\x7c\x7d'1
   '\x7d\x7e'1
   '\x7e\x7f'1
   '\x7f\x80'0
   '\x80\x81'0
   '\x81\x82'0
   '\x82\x83'0
   '\x83\x84'0
   '\x84\x85'0
   '\x85\x86'0
   '\x86\x87'0
   '\x87\x88'0
   '\x88\x89'0
   '\x89\x8a'0
   '\x8a\x8b'0
   '\x8b\x8c'0
   '\x8c\x8d'0
   '\x8d\x8e'0
   '\x8e\x8f'0

Update: The same pattern occurs for single-byte patterns -- no bytes with value greater than 0x7f are found.

Upvotes: 3

Views: 3796

Answers (1)

Dr. Alex RE
Dr. Alex RE

Reputation: 1698

The results may depend on you current locale. To avoid this, use:

env LANG=LC_ALL grep -P "<binary pattern>" <file>

where env LANG=LC_ALL overrides your current locale to allow byte matching. Otherwise, patterns with non-ASCII "characters" such as \xff will not match.

For example, this fails to match because (at least in my case) the environment has LANG=en_US.UTF-8:

$ printf '\x41\xfe\n' | grep -P '\xfe'

when this succeeds:

$ printf '\x41\xfe\n' | env LANG=LC_ALL grep -P '\xfe'
A?

Upvotes: 1

Related Questions