Reputation: 2564
Is it possible to search �
set on non-ASCII chars in a file in unix?
I want to search all these characters in bash to replace them with two spaces.
sed -i 's/[�]/\ \ /g' filename
worked worked finally
Upvotes: 0
Views: 202
Reputation: 154921
The way to search for those chars will depend on their encoding in the file. If the file is in the UTF-8 encoding, you can set the UTF-8 locale and simply match them from the shell. Assuming GNU sed (the default on Linux), the command line will look like this:
LANG=C.UTF-8 sed -i 's/[�]/ /g' filename
For this to work, you must be in a UTF-8-compliant shell, so that e.g. echo 'ï' | wc -c
outputs 3
(two UTF-8 code units plus newline).
Upvotes: 1
Reputation: 189387
You seem to be looking at UTF-8 data using a Latin-1 tool. Hence, your question is basically ill-defined, but assuming you want to find files containing a UTF-8 replacement character, try something like
perl -CSD -nle 'if m/^\x{FFFD}/ { print $ARGV; close() }' files ...
Here's what I used to understand your question:
$ echo -n '�' | iconv -t iso-8859-1 | xxd
0000000: efbf bd
Googling for efbfbd
quickly brought up http://www.fileformat.info/info/unicode/char/0fffd/index.htm among the top hits.
Note also that U+FFFD is basically an error code. You should properly not find and replace it. You should find out which previous encoding step failed and produced this, and fix that instead.
Upvotes: 1