IAmYourFaja
IAmYourFaja

Reputation: 56904

How to grep for presence of specific hex bytes in files?

My web app is displaying some bizarro output (unicode characters that shouldn't be there, etc.). The best I can reckon is that somehow I introduced a bad char somewhere in the source, but I can't figure out where.

I found this answer that states I can do something like:

grep -obUaP "<\x-hex pattern>" .

When I copy the unicode char out of the browser and into my Bless hex editor, it tells me that the exact bytes of the char are:

15 03 01 EF BF BD 02 02

How can I format <\xhex pattern> to match the exact bytes that I need. I tried:

grep -obUaP "<\x-15 03 01 EF BF BD 02 02>" .

But that doesn't work. Thoughts?

Upvotes: 5

Views: 7156

Answers (2)

Elliot
Elliot

Reputation: 2690

It may be easiest to write the pattern of hex bytes to a separate file and load that into stdin for the search.

In this example there is a file sampletext, consisting of the 256 sequential bytes and the occasional newline, and searchstring, a sequence of characters to grep for.

$ xxd sampletext 
00000000: 0001 0203 0405 0607 0809 0a0b 0c0d 0e0f  ................
00000010: 0a10 1112 1314 1516 1718 191a 1b1c 1d1e  ................
00000020: 1f0a 2021 2223 2425 2627 2829 2a2b 2c2d  .. !"#$%&'()*+,-
00000030: 2e2f 0a30 3132 3334 3536 3738 393a 3b3c  ./.0123456789:;<
00000040: 3d3e 3f0a 4041 4243 4445 4647 4849 4a4b  =>?.@ABCDEFGHIJK
00000050: 4c4d 4e4f 0a50 5152 5354 5556 5758 595a  LMNO.PQRSTUVWXYZ
00000060: 5b5c 5d5e 5f0a 6061 6263 6465 6667 6869  [\]^_.`abcdefghi
00000070: 6a6b 6c6d 6e6f 0a70 7172 7374 7576 7778  jklmno.pqrstuvwx
00000080: 797a 7b7c 7d7e 7f0a 8081 8283 8485 8687  yz{|}~..........
00000090: 8889 8a8b 8c8d 8e8f 0a90 9192 9394 9596  ................
000000a0: 9798 999a 9b9c 9d9e 9f0a a0a1 a2a3 a4a5  ................
000000b0: a6a7 a8a9 aaab acad aeaf 0ab0 b1b2 b3b4  ................
000000c0: b5b6 b7b8 b9ba bbbc bdbe bf0a c0c1 c2c3  ................
000000d0: c4c5 c6c7 c8c9 cacb cccd cecf 0ad0 d1d2  ................
000000e0: d3d4 d5d6 d7d8 d9da dbdc ddde df0a e0e1  ................
000000f0: e2e3 e4e5 e6e7 e8e9 eaeb eced eeef 0af0  ................
00000100: f1f2 f3f4 f5f6 f7f8 f9fa fbfc fdfe ff0a  ................

$ xxd searchstring 
00000000: 8081 8283                                ....

By redirecting searchstring into stdin, grep can look for the bytes directly

$ grep -a "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 8485 8687 8889 8a8b 8c8d 8e8f  ................
00000010: 0a                                       .

$ grep -ao "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 0a                             .....

Upvotes: 0

chacewells
chacewells

Reputation: 921

Check the post again. FrOsT is not including the '<' and '>' in his actual grep command. He only used the carats to enclose an example statement. His actual statement looks like this:

"\x01\x02"

not:

"<\x01\x02>"

I have a C source file on my computer that begins with the line:

#include <stdio.h>

When I run

grep -obUaP '\x69\x6E\x63\x6C\x75\x64\x65' io.c

I get

1:include

That is, the line number followed by only the string matching the pattern.

You may want to run

man grep

and find out what all those options mean.

Upvotes: 5

Related Questions