Reputation: 56904
My web app is displaying some bizarro output (unicode characters that shouldn't be there, etc.). The best I can reckon is that somehow I introduced a bad char somewhere in the source, but I can't figure out where.
I found this answer that states I can do something like:
grep -obUaP "<\x-hex pattern>" .
When I copy the unicode char out of the browser and into my Bless hex editor, it tells me that the exact bytes of the char are:
15 03 01 EF BF BD 02 02
How can I format <\xhex pattern>
to match the exact bytes that I need. I tried:
grep -obUaP "<\x-15 03 01 EF BF BD 02 02>" .
But that doesn't work. Thoughts?
Upvotes: 5
Views: 7156
Reputation: 2690
It may be easiest to write the pattern of hex bytes to a separate file and load that into stdin
for the search.
In this example there is a file sampletext
, consisting of the 256 sequential bytes and the occasional newline, and searchstring
, a sequence of characters to grep for.
$ xxd sampletext
00000000: 0001 0203 0405 0607 0809 0a0b 0c0d 0e0f ................
00000010: 0a10 1112 1314 1516 1718 191a 1b1c 1d1e ................
00000020: 1f0a 2021 2223 2425 2627 2829 2a2b 2c2d .. !"#$%&'()*+,-
00000030: 2e2f 0a30 3132 3334 3536 3738 393a 3b3c ./.0123456789:;<
00000040: 3d3e 3f0a 4041 4243 4445 4647 4849 4a4b =>?.@ABCDEFGHIJK
00000050: 4c4d 4e4f 0a50 5152 5354 5556 5758 595a LMNO.PQRSTUVWXYZ
00000060: 5b5c 5d5e 5f0a 6061 6263 6465 6667 6869 [\]^_.`abcdefghi
00000070: 6a6b 6c6d 6e6f 0a70 7172 7374 7576 7778 jklmno.pqrstuvwx
00000080: 797a 7b7c 7d7e 7f0a 8081 8283 8485 8687 yz{|}~..........
00000090: 8889 8a8b 8c8d 8e8f 0a90 9192 9394 9596 ................
000000a0: 9798 999a 9b9c 9d9e 9f0a a0a1 a2a3 a4a5 ................
000000b0: a6a7 a8a9 aaab acad aeaf 0ab0 b1b2 b3b4 ................
000000c0: b5b6 b7b8 b9ba bbbc bdbe bf0a c0c1 c2c3 ................
000000d0: c4c5 c6c7 c8c9 cacb cccd cecf 0ad0 d1d2 ................
000000e0: d3d4 d5d6 d7d8 d9da dbdc ddde df0a e0e1 ................
000000f0: e2e3 e4e5 e6e7 e8e9 eaeb eced eeef 0af0 ................
00000100: f1f2 f3f4 f5f6 f7f8 f9fa fbfc fdfe ff0a ................
$ xxd searchstring
00000000: 8081 8283 ....
By redirecting searchstring
into stdin
, grep can look for the bytes directly
$ grep -a "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 8485 8687 8889 8a8b 8c8d 8e8f ................
00000010: 0a .
$ grep -ao "$(<searchstring)" sampletext | xxd
00000000: 8081 8283 0a .....
Upvotes: 0
Reputation: 921
Check the post again. FrOsT is not including the '<' and '>' in his actual grep command. He only used the carats to enclose an example statement. His actual statement looks like this:
"\x01\x02"
not:
"<\x01\x02>"
I have a C source file on my computer that begins with the line:
#include <stdio.h>
When I run
grep -obUaP '\x69\x6E\x63\x6C\x75\x64\x65' io.c
I get
1:include
That is, the line number followed by only the string matching the pattern.
You may want to run
man grep
and find out what all those options mean.
Upvotes: 5