Reputation: 8978
How can I grep files with the special character ”
notice is different from "
I tried escaping it but wont work.
When I open the files with vim it appears as <94>
File example
<p>"hello”></p>
I want to be able to grep -rne "\”"
Upvotes: 1
Views: 94
Reputation: 189327
With modern GNU grep
and properly configured locales, this should just work.
If your grep
is not 8bit-savvy or your locales are hosed, maybe try e.g.
perl -ne 'print if /\x94/' files ...
It's not too hard to reimplement grep -rn
in Perl but if this is a quick one-off, try
find . -type f -exec perl -ne 'print "$ARGV:$.:$_" if /\x94/' {} +
In some sense your locale is hosed or at least marginally nonstandard if \x94
is displayed as a curly quote. Your system is apparently configured to use some legacy Windows 8-bit encoding...?
The curly quote isn't a shell or regex metacharacter so there should be no need to backslash it.
In some more details, based on comments, the fundamental problem is that your system is set up to use UTF-8 but the file uses a different encoding. So grep "”"
really searches for the UTF-8 encoding of U+201D which translates to perl -ne 'print if /\xe2\x80\x9d/'
If you don't know the byte value of the character, but you know the encoding, you can do
echo "”" | iconv -f utf-8 -t ENCODING | grep -f -
Of course, you can easily obtain the byte value by similar means;
echo "”" | iconv -f utf-8 -t ENCODING | xxd
or just view the file in a tool like less
which shows unknown bytes in hex.
Maybe see also https://tripleee.github.io/8bit#9d -- with just a single byte, it doesn't matter which precise encoding the file is using (if it's HTML, the default in HTML 5 is, bewilderingly, Windows code page 1252) but if you have a few unknown bytes for which you know or can guess the expected rendering, this table can help you establish the precise encoding.
If your grep
is not 8-bit savvy, maybe you are using equipment from the Museum of Retrocomputing. If your locale is weird, maybe troubleshoot that - ideally you want UTF-8 everywhere.
Upvotes: 2