Ivan Smirnov
Ivan Smirnov

Reputation: 4435

Bytewise grep fails for some byte values

I am studying the gzip format, and I tried to grep its magic bytes, 1F 8B, in a sample archive. I used the manual from this answer.

xxd a.gz

Output:

00000000: 1f8b 0800 43dc 605b 0003 4bcb cf4f 4a2c  ....C.`[..K..OJ,
00000010: e202 0047 972c b207 0000 00              ...G.,.....

grep -obUaP "\x1f" a.gz

Output:

0:

grep -obUaP "\x8b" a.gz

Output:

# Nothing is printed

For some reason, grep finds one byte and does not find another. After some investigation, we had a blind guess that it fails on bytes with the most significant bit set. However, we couldn't find any reasonable explanation.

Why does it happen and is there a workaround?

Upvotes: 2

Views: 154

Answers (1)

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798606

Probably because grep is working with UTF-8; when you search for "\x8b" it's looking for 0xc2 0x8b. You will need to either find some way to disable grep's UTF-8 support, or switch to a tool that strictly interprets the search criteria as binary values.

Upvotes: 3

Related Questions