codeholic
codeholic

Reputation: 5848

Malformed UTF-8 character error in regular expression in Perl

I have 'Malformed UTF-8 character' error when I'm putting some scalar data in XML::Simple or Data::Dumper. There are regular expressions on the lines where the error occurs.

Malformed UTF-8 character (fatal) at /usr/share/perl5/XML/Simple.pm line 1690.
Malformed UTF-8 character (fatal) at /usr/lib/perl/5.10/Data/Dumper.pm line 682.

At the moment I failed to reproduce the error with a small piece of code.

XML::Simple 2.18
Data::Dumper 2.124
perl v5.10.1

Upvotes: 0

Views: 5044

Answers (3)

codeholic
codeholic

Reputation: 5848

The problem arose because somewhere deep in the code of the application there was Encode::_utf8_on with a scalar, that wasn't a proper UTF-8 string.

Upvotes: 2

Grant McLean
Grant McLean

Reputation: 6998

You could try piping your data through Encoding::FixLatin. If the 'binary' bytes you're encountering are actually Latin-1 characters then they'll get converted to valid UTF8. If they really are random binary bytes then they should at least get converted to random (but valid) UTF8 characters :-)

Upvotes: 1

Eugene Yarmash
Eugene Yarmash

Reputation: 149736

The core Encode module provides facilities for Handling Malformed Data. I never used them myself, though.

Upvotes: 0

Related Questions