Reputation: 1497
I am parsing a very large log file with Perl. The code is:
open($input_handle, '<:encoding(UTF-8)', $input_file);
while (<$input_handle>) {
...
}
close($input_handle);
However, sometimes the log file contains faulty characters, and I get the following message:
utf8 "\xD0" does not map to Unicode at log_parser.pl line 32, <$input_handle> line 10920.
I am aware of the characters and I would just like to ignore them without the log message flooding my (Windows!) build server logs. I tried no warnings 'utf8';
but it did not help.
How can I suppress the message?
Upvotes: 1
Views: 1411
Reputation: 385546
You could do the decoding yourself instead of using the :encoding
layer. By default, Encode's decode
and decode_utf8
simply exchange the bad character with U+FFFD rather than warning.
$ perl -e'
use Encode qw( decode_utf8 );
$bytes = "\xD0 \x92 \xD0\x92\n";
$text = decode_utf8($bytes);
printf("U+%v04X\n", $text);
'
U+FFFD.0020.FFFD.0020.0412.000A
If the file is a mix of UTF-8, iso-8859-1 and cp1252, it may be possible to fix the file rather than simply silencing the errors, as detailed here.
Upvotes: 3